Distributed Algorithms for Computing Very Large Thresholded Covariance Matrices

dc.contributor.advisorJermaine, Christopheren_US
dc.contributor.committeeMemberNakhleh, Luayen_US
dc.contributor.committeeMemberSubramanian, Devikaen_US
dc.creatorGao, Zekaien_US
dc.date.accessioned2016-01-15T21:32:33Zen_US
dc.date.available2016-01-15T21:32:33Zen_US
dc.date.created2014-12en_US
dc.date.issued2014-09-26en_US
dc.date.submittedDecember 2014en_US
dc.date.updated2016-01-15T21:32:33Zen_US
dc.description.abstractComputation of covariance matrices from observed data is an important problem, as such matrices are used in applications such as PCA, LDA, and increasingly in the learning and application of probabilistic graphical models. One of the most challenging aspects of constructing and managing covariance matrices is that they can be huge and the size makes then expensive to compute. For a p-dimensional data set with n rows, the covariance matrix will have p(p-1)/2 entries and the naive algorithm to compute the matrix will take O(np^2) time. For large p (greater than 10,000) and n much greater than p, this is debilitating. In this thesis, we consider the problem of computing a large covariance matrix efficiently in a distributed fashion over a large data set. We begin by considering the naive algorithm in detail, pointing out where it will and will not be feasible. We then consider reducing the time complexity using sampling-based methods to compute to compute an approximate, thresholded version of the covariance matrix. Here “thresholding” means that all of the unimportant values in the matrix have been dropped and replaced with zeroes. Our algorithms have probabilistic bounds which imply that with high probability, all of the top K entries in the matrix have been retained.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationGao, Zekai. "Distributed Algorithms for Computing Very Large Thresholded Covariance Matrices." (2014) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/87863">https://hdl.handle.net/1911/87863</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/87863en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectDistributed algorithmsen_US
dc.subjectcovariance matricesen_US
dc.titleDistributed Algorithms for Computing Very Large Thresholded Covariance Matricesen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
GAO-DOCUMENT-2014.pdf
Size:
1.29 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: