ANML Documentation

Automata Versus Regular Expressions

A regular expression (regex) is a string of characters that defines a text search pattern. For example, the following regex searches for the characters 123 with an optional abc or xyz occurring between 1 and 2:


The strings 1abc23, 1xyz23, and 123 would all match this regex.

Figure 1 shows this regex converted into a graph with nodes representing the characters and arrows representing the paths for finding a matching text string:

Figure 1. Example Regex
Automata regex example

Similarly, in the context of automata computing, nodes detect specific characters and arrows show paths. Figure 2 shows the automaton implementation of the example regex.

Figure 2. Example Regex as an Automaton
Automata regex automata conversion

As discussed earlier, a state in an automaton is defined by the STE. Each STE is programmed to accept a specific set of symbols, and the STEs are chained together with transition arrows which represent activation connections.

In Figure 2, the STE with the 1 indicator is active on the very first input processing cycle. This is analogous to the start anchor (^) in regex. The STE with the indicator R means the STE will generate a report event if it matches the input symbol during the cycle in which it is active. This is analogous to the string matching the regular expression.

Table 1. Regex Functions vs. Automata Functions
Regex Automata
Characters/character classes STEs programmed with characters/character classes
Beginning anchor STE start-of-data attribute
Unanchored expressions STE all-input attribute
Tail anchor End-of-data (EOD) signal
Repetitions Loopback connections, possibly with counters
Match-any-character (.) Match-any-character (*)
Quantifiers Can be implemented with repeated STE chains or counters
Modifiers Certain modifiers are supported
Match Reporting element