Skip directly to: content | search

ExCAPE: Expeditions in Computer Augmented Program Engineering

Regular String Transformations

PI: Alur (Penn)

Theory of regular languages (of strings, infinite strings, and trees) has proved to be an excellent foundation for qualitative verification and synthesis of finite-state systems. The goal of this research thread is to develop a similarly robust foundation for quantitative analysis and synthesis. We propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple “write-only” registers whose values can be combined using the allowed set of operations.We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

In a recent result, we have developed an algebraic and machine-independent characterization of regular string functions analogous to the definition of regular languages by regular expressions. When the set of values is a commutative monoid (for example, numerical costs with addition), we have proved that every regular function can be constructed from constant functions using the combinators of choice, split sum, and iterated sum, that are analogs of union, concatenation, and Kleene-*, respectively, but enforce unique (or unambiguous) parsing. The structure is more fascinating when the set of values is a non-commutative monoid, for instance, the set of output strings with concatenation operation, which is of particular interest for capturing regular string-to-string transformations for document processing. We proved that the following additional combinators suffice for constructing all regular functions: the left-additive versions of split sum and iterated sum, which allow transformations such as string reversal; sum of functions, which allows transformations such as copying of strings; and function composition, or alternatively, a new concept of chained sum, which allows output values from adjacent blocks to mix. The work on regular string transformations, though not immediately applicable to verification and synthesis problems, is of foundational nature with potential for long-term impact, and has led to several publications in theory conferences [AR13,ADDRY13,ADGT13,ADR14].


ExCAPE: Expeditions in Computer Augmented Program Engineering NSF National Science Foundation Award CCF-1138996