Mining Massive-Scale Time Series Data using Hashing

Luo, Chen

Mining Massive-Scale Time Series Data using Hashing

dc.contributor.advisor	Shrivastava, Anshumali	en_US
dc.creator	Luo, Chen	en_US
dc.date.accessioned	2017-08-01T18:53:54Z	en_US
dc.date.available	2017-08-01T18:53:54Z	en_US
dc.date.created	2017-05	en_US
dc.date.issued	2017-05-09	en_US
dc.date.submitted	May 2017	en_US
dc.date.updated	2017-08-01T18:53:54Z	en_US
dc.description.abstract	Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an eﬃcient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Empirical results on two large-scale benchmark time series data show that our proposed method prunes around 95% time series candidates and can be around 20 times faster than the state-of-the-art package (UCR suite) without any signiﬁcant loss in accuracy.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Luo, Chen. "Mining Massive-Scale Time Series Data using Hashing." (2017) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/96124">https://hdl.handle.net/1911/96124</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/96124	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Time Series	en_US
dc.subject	Searching	en_US
dc.subject	Data Mining	en_US
dc.subject	Machine Learning	en_US
dc.title	Mining Massive-Scale Time Series Data using Hashing	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Computer Science	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Masters	en_US
thesis.degree.name	Master of Science	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LUO-DOCUMENT-2017.pdf
Size:: 1.74 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.6 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations