Giessener Elektronische Bibliothek

GEB - Giessener Elektronische Bibliothek

Simplifying regular expressions : A quantitative perspective

Gruber, Hermann ; Gulan, Stefan


Zum Volltext im pdf-Format: Dokument 1.pdf (218 KB)


Bitte beziehen Sie sich beim Zitieren dieses Dokumentes immer auf folgende
URN: urn:nbn:de:hebis:26-opus-90824
URL: http://geb.uni-giessen.de/geb/volltexte/2012/9082/

Bookmark bei Connotea Bookmark bei del.icio.us


Freie Schlagwörter (Englisch): regular expressions , finite automata , star normal form , efficient algorithm
Universität Justus-Liebig-Universität Gießen
Institut: Institut für Informatik
Fachgebiet: Informatik
DDC-Sachgruppe: Informatik
Dokumentart: ResearchPaper
Zeitschrift, Serie: IFIG Research Report ; 0904 / 2009
Sprache: Englisch
Erstellungsjahr: 2009
Publikationsdatum: 23.11.2012
Kurzfassung auf Deutsch: In this work, we consider the efficient simplification of regular expressions. We suggest a quantitative comparison of heuristics for simplifying regular expressions. We propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. We apply this normal form to determine an exact bound for the relation between the two most common size measures for regular expressions, namely alphabetic width and reverse polish notation length. Then we proceed to show that every regular expression of alphabetic with n can be converted into a nondeterministic finite automaton with e-transitions of size at most 42 5n + 1, and that this bound is optimal. This provides an exact resolution of a research problem posed by Ilie and Yu, who had obtained lower and upper bounds of 4n -1 and 9n - 1 2, respectively [L. Ilie, S. Yu: Follow automata. Inform. Comput. 186, 2003]. For reverse polish notation length as input size measure, an optimal bound was recently determined [S. Gulan, H. Fernau: An optimal construction of finite automata from regular expressions. In: Proc. FST&TCS, 2008]. We prove that, under mild restrictions, their construction is also optimal when taking alphabetic width as input size measure.
Lizenz: Veröffentlichungsvertrag für Publikationen ohne Print on Demand