Formally Verified Algorithms for Temporal Logic and Regular Expressions
dc.contributor.advisor | Mamouras, Konstantinos | en_US |
dc.creator | Chattopadhyay, Agnishom | en_US |
dc.date.accessioned | 2024-08-30T18:38:04Z | en_US |
dc.date.created | 2024-08 | en_US |
dc.date.issued | 2024-08-09 | en_US |
dc.date.submitted | August 2024 | en_US |
dc.date.updated | 2024-08-30T18:38:04Z | en_US |
dc.description | EMBARGO NOTE: This item is embargoed until 2025-02-01 | en_US |
dc.description.abstract | The behavior of systems in various domains including IoT networks, cyber-physical systems and runtime environments of programs can be observed in the form of linear traces. **Temporal logic** and **regular expressions** are two core formalisms used to specify properties of such data. This thesis extends these formalisms to enable the expression of richer classes of properties in a succinct manner together with algorithms that can handle them efficiently. Using the Coq proof assistant, we formalize the semantics of our specification languages and verify the correctness of our algorithms using mechanically checked proofs. The verified algorithms have been extracted to executable code, and our emperical evaluation shows that they are competitive with state-of-the-art tools. The first part of the thesis is focused on investigating the formalization of an online monitoring framework for past-time metric temporal logic (MTL). We employ an algebraic quantitative semantics that encompasses the Boolean and robustness semantics of MTL and we interpret formulas over a discrete temporal domain. A potentially infinite-state variant of Mealy machines, a kind of string transducers, is used as a formal model of online monitors. We demonstrate a compositional construction from formulas to monitors, such that each monitor computes (in an online fashion) the semantic values of the corresponding formula over the input stream. The time taken by the monitor to process each input item is proportional to O(|φ|) where |φ| is the size of the formula, and is independent of the constants that appear in the formula. The monitor uses O(m) space where m is the sum of the numerical constants that appear in the formula. The latter part of the thesis is focused on regular expressions. Regular expressions in practice often contain lookaround assertions, which can be used to refine matches based on the surrounding context. Our formal semantics of lookarounds complements the commonly used operational understanding of lookaround in terms of a backtracking implementation. Widely used regular expression matching engines take exponential time to match regular expressions with lookarounds in the worst case. Our algorithm has a worst-case time complexity of O(m · n), where m is the size of the regex and n is the size of the input string. The key insight is to evaluate the lookarounds in a bottom-up manner, and guard automaton transitions with oracle queries evaluating the lookarounds. We demonstrate how this algorithm can be implemented in a purely functional manner using marked regular expressions. The formal semantics of lookarounds and our matching algorithm is verified in Coq. Finally, we investigate the formalization of a tokenization algorithm. Tokenization is the process of breaking a monolithic string into a stream of tokens. This is one of the very first steps in the compilation of programs. In this setting, the set of possible tokens is often described using an ordered list of regular expressions. Our algorithm is based on the simulation of the Thompson NFA of the given regular expressions. Two significant parts of the verification effort involve demonstrating the correctness of Thompson's algorithm and the computation of ε-closures using depth-first search. For a stream of length n and a list of regular expressions of total size m, our algorithm finds the first token in O(m · n) time, and tokenizes the entire stream in O(m · n^2) time in the worst-case. | en_US |
dc.embargo.lift | 2025-02-01 | en_US |
dc.embargo.terms | 2025-02-01 | en_US |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Chattopadhyay, Agnishom. Formally Verified Algorithms for Temporal Logic and Regular Expressions. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/117837 | en_US |
dc.identifier.uri | https://hdl.handle.net/1911/117837 | en_US |
dc.language.iso | eng | en_US |
dc.rights | Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder. | en_US |
dc.subject | formal-methods | en_US |
dc.subject | programming-languages | en_US |
dc.subject | regular-expressions | en_US |
dc.subject | automata-theory | en_US |
dc.subject | temporal-logic | en_US |
dc.subject | proof-assistants | en_US |
dc.subject | functional-programming | en_US |
dc.subject | online-monitoring | en_US |
dc.subject | lookarounds | en_US |
dc.subject | tokenization | en_US |
dc.subject | lexing | en_US |
dc.subject | coq | en_US |
dc.title | Formally Verified Algorithms for Temporal Logic and Regular Expressions | en_US |
dc.type | Thesis | en_US |
dc.type.material | Text | en_US |
thesis.degree.department | Computer Science | en_US |
thesis.degree.discipline | Engineering | en_US |
thesis.degree.grantor | Rice University | en_US |
thesis.degree.level | Doctoral | en_US |
thesis.degree.name | Doctor of Philosophy | en_US |