Static Analysis for Checking the Disambiguation Robustness of Regular Expressions

dc.citation.firstpage231:2073en_US
dc.citation.issueNumberPLDIen_US
dc.citation.journalTitleProc. ACM Program. Lang.en_US
dc.citation.lastpage231:2097en_US
dc.citation.volumeNumber8en_US
dc.contributor.authorMamouras, Konstantinosen_US
dc.contributor.authorLe Glaunec, Alexisen_US
dc.contributor.authorLi, Wu Angelaen_US
dc.contributor.authorChattopadhyay, Agnishomen_US
dc.date.accessioned2024-10-08T13:27:47Zen_US
dc.date.available2024-10-08T13:27:47Zen_US
dc.date.issued2024en_US
dc.description.abstractRegular expressions are commonly used for finding and extracting matches from sequence data. Due to the inherent ambiguity of regular expressions, a disambiguation policy must be considered for the match extraction problem, in order to uniquely determine the desired match out of the possibly many matches. The most common disambiguation policies are the POSIX policy and the greedy (PCRE) policy. The POSIX policy chooses the longest match out of the leftmost ones. The greedy policy chooses a leftmost match and further disambiguates using a greedy interpretation of Kleene iteration to match as many times as possible. The choice of disambiguation policy can affect the output of match extraction, which can be an issue for reusing regular expressions across regex engines. In this paper, we introduce and study the notion of disambiguation robustness for regular expressions. A regular expression is robust if its extraction semantics is indifferent to whether the POSIX or greedy disambiguation policy is chosen. This gives rise to a decision problem for regular expressions, which we prove to be PSPACE-complete. We propose a static analysis algorithm for checking the (non-)robustness of regular expressions and two performance optimizations. We have implemented the proposed algorithms and we have shown experimentally that they are practical for analyzing large datasets of regular expressions derived from various application domains.en_US
dc.identifier.citationMamouras, K., Le Glaunec, A., Li, W. A., & Chattopadhyay, A. (2024). Static Analysis for Checking the Disambiguation Robustness of Regular Expressions. Proc. ACM Program. Lang., 8(PLDI), 231:2073-231:2097. https://doi.org/10.1145/3656461en_US
dc.identifier.digital3656461en_US
dc.identifier.doihttps://doi.org/10.1145/3656461en_US
dc.identifier.urihttps://hdl.handle.net/1911/117912en_US
dc.language.isoengen_US
dc.publisherAssociation for Computing Machineryen_US
dc.rightsExcept where otherwise noted, this work is licensed under a Creative Commons Attribution (CC BY) license. Permission to reuse, publish, or reproduce the work beyond the terms of the license or beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.titleStatic Analysis for Checking the Disambiguation Robustness of Regular Expressionsen_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpublisher versionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3656461.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format