Mining Massive-Scale Time Series Data using Hashing

dc.contributor.advisorShrivastava, Anshumalien_US
dc.creatorLuo, Chenen_US
dc.date.accessioned2017-08-01T18:53:54Zen_US
dc.date.available2017-08-01T18:53:54Zen_US
dc.date.created2017-05en_US
dc.date.issued2017-05-09en_US
dc.date.submittedMay 2017en_US
dc.date.updated2017-08-01T18:53:54Zen_US
dc.description.abstractSimilarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Empirical results on two large-scale benchmark time series data show that our proposed method prunes around 95% time series candidates and can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLuo, Chen. "Mining Massive-Scale Time Series Data using Hashing." (2017) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/96124">https://hdl.handle.net/1911/96124</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/96124en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectTime Seriesen_US
dc.subjectSearchingen_US
dc.subjectData Miningen_US
dc.subjectMachine Learningen_US
dc.titleMining Massive-Scale Time Series Data using Hashingen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LUO-DOCUMENT-2017.pdf
Size:
1.74 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: