Mamouras, Konstantinos2023-09-012023-082023-08-11August 202Kong, Lingkun. "Language Support for Real-time Data Processing." (2023) Diss., Rice University. https://hdl.handle.net/1911/115265.https://hdl.handle.net/1911/115265Recent technological advances are causing an enormous proliferation of streaming data, i.e., data that is generated in real-time. Such data is produced at an overwhelming rate that cannot be processed in traditional manners. This thesis aims to provide programming language support for real-time data processing through three approaches: (1) creating a language for specifying complex computations over real-time data streams, (2) developing software-hardware co-design to efficiently match regular patterns in a streaming setting, and (3) designing a system for parallel stream processing with the preservation of sequential semantics. The first part of this thesis introduces StreamQL, a high-level language for specifying complex streaming computations through a combination of stream transformations. StreamQL integrates relational, dataflow, and temporal constructs, offering an expressive and modular approach for programming streaming computations. Performance comparisons against popular streaming engines show that the StreamQL library consistently achieves higher throughput, making it a useful tool for prototyping complex real-world streaming algorithms. The second part of this thesis focuses on hardware acceleration for regular pattern matching, specifically targeting the matching of regular expressions with bounded repetitions. A hardware architecture inspired by nondeterministic counter automata is presented, which uses counter and bit vector modules to efficiently handle bounded repetitions. A regex-to-hardware compiler is developed in this work, which provides static analysis over regular expressions and translates them into hardware-recognizable programs. Experimental results show that our solution provides significant improvements in energy efficiency and area reduction compared to existing solutions. Finally, this thesis presents a novel programming system for parallelizing the processing of streaming data on multicore CPUs with the preservation of sequential semantics. This system addresses challenges in preserving the sequential semantics when dealing with identical timestamps, dynamic item rates, and non-linear task parallelism. A Rust library called ParaStream is developed to support semantics-preserving parallelism in stream processing, outperforming state-of-the-art tools in terms of single-threaded throughput and scalability. Real-world benchmarks show substantial performance gains with increasing degrees of parallelism, highlighting the practicality and efficiency of ParaStream.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.stream processingregular pattern matchingparallel stream processingLanguage Support for Real-time Data ProcessingThesis2023-09-01