Static Analysis for Checking the Disambiguation Robustness of Regular Expressions

Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computing Machinery
Abstract

Regular expressions are commonly used for finding and extracting matches from sequence data. Due to the inherent ambiguity of regular expressions, a disambiguation policy must be considered for the match extraction problem, in order to uniquely determine the desired match out of the possibly many matches. The most common disambiguation policies are the POSIX policy and the greedy (PCRE) policy. The POSIX policy chooses the longest match out of the leftmost ones. The greedy policy chooses a leftmost match and further disambiguates using a greedy interpretation of Kleene iteration to match as many times as possible. The choice of disambiguation policy can affect the output of match extraction, which can be an issue for reusing regular expressions across regex engines. In this paper, we introduce and study the notion of disambiguation robustness for regular expressions. A regular expression is robust if its extraction semantics is indifferent to whether the POSIX or greedy disambiguation policy is chosen. This gives rise to a decision problem for regular expressions, which we prove to be PSPACE-complete. We propose a static analysis algorithm for checking the (non-)robustness of regular expressions and two performance optimizations. We have implemented the proposed algorithms and we have shown experimentally that they are practical for analyzing large datasets of regular expressions derived from various application domains.

Description
Advisor
Degree
Type
Journal article
Keywords
Citation

Mamouras, K., Le Glaunec, A., Li, W. A., & Chattopadhyay, A. (2024). Static Analysis for Checking the Disambiguation Robustness of Regular Expressions. Proc. ACM Program. Lang., 8(PLDI), 231:2073-231:2097. https://doi.org/10.1145/3656461

Has part(s)
Forms part of
Rights
Except where otherwise noted, this work is licensed under a Creative Commons Attribution (CC BY) license. Permission to reuse, publish, or reproduce the work beyond the terms of the license or beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Citable link to this page