Developing Applications for the Automata Processor

Developing applications for the Automata Processor involves two workflows:

  1. The creation of automata that will be loaded onto the AP hardware.

    This development cycle follows a model similar to FPGA development.

  2. The creation of application software to manage the input data written to the AP hardware and to process of events received from the AP hardware. This is a traditional software engineering workflow using the Runtime API. Refer to the Runtime Programmers Reference for information regarding use of this API.

Creating Automata

The SDK provides three methods to create automata. The first two methods use a compile tool provided with the SDK to transform a human readable representation into the output automaton. In addition the SDK also provides API's to allow any customer specific tool to be integrated into the compile workflow.

  1. PCRE Rules
  2. AP Workbench
  3. ANML and PCRE Application Programming Interfaces.

The compilation of automata follows the same workflow irrespective of the input language. The first step of compilation parses the input language and transforms it into a machine readable form. In both cases the compiler generates an internal representation of a Non-deterministic Finite Automata (NFA). This internal model differs slightly from traditional NFA to accommodate hardware components such as boolean and counter elements.

The NFA is then optimized and split into smaller NFA. These smaller NFA are then further modified to ensure design rules are satisfied and then they are placed and routed to fit on the AP fabric. The resultant is a binary relocatable image which can be loaded and executed on the AP hardware using the Runtime API.

compile_model.jpg
Compilation of Automata

Optimization

The objective of optimization is to minimize the number of hardware resources required to represent the automata and to ensure the automata edge connections can be feasibly routed using the available fabric on the AP hardware.

The initial stage of optimization computes a smaller NFA by merging the equivalent states. The optimizer merges the equivalent states for both left and right languages by default, however the programmer can modify this behavior.

If using the command line tool the following optimization settings are available:

  • --disable-left-minimization: Disable left minimization on the automaton.
  • --disable-right-minimization: Disable right minimization on the automaton.
  • --merge-expression: Enable a more aggressive right minimization but limit merging to states that share the same report code. The setting has no effect when combined with --disable-right-minimization.
  • --merge-all-expressions: Enable the most aggressive right minimization at the expense of not being able to determine which pattern caused the match. The setting has no effect when combined with --disable-right-minimization.
  • -Od: Disable automaton minimization; equivalent to setting both --disable-left-minimization and --disable-right-minimization.

If using the ANML or PCRE Application Programming Interfaces the following compile options are available.

  • AP_OPT_DISABLE_MIN_RIGHT: Disable right minimization on the automaton.
  • AP_OPT_DISABLE_MIN_LEFT: Disable left minimization on the automaton.
  • AP_OPT_MERGE_IF_SAME_REPORT_CODE: Enable a more aggressive optimization for single expression. The option has no effect when combined with AP_OPT_DISABLE_MIN_RIGHT.
  • AP_OPT_MERGE_ALL_REPORTS: Enable a more aggressive minimization of the STE usage at the expense of not being able to determine which pattern caused the match. The option has no effect when combined with AP_OPT_DISABLE_MIN_RIGHT.

Timing

The clock period for the Automata Processor is determined by the delay of the slowest path. The smallest supported clock period is 7.45ns in the AP chip; slower clocks are possible, and the actual clock period is often set as an integer multiple of the base clock period.

By default, the compiler attempts to compile automata so that they can run with the minimum clock period (maximum speed), and automatically relaxes timing constraints when necessary. Larger clock period accommodates more complex logic and complex edge connections.

If using the command line tool the timing will be output to the console if the verbose (-v) setting is specified. You can determine the timing of a compiled FSM programmatically by calling AP_GetInfo().

The following Tables describe the number of destination elements reachable given the source and destination resource types.

Reachable Elements Timing Target = 1 x 7.45ns
Drive Out
Resource
Input Type
BooleanCounter INCounter RSTSTE
Boolean000256
Counter004256
STE121242304
Reachable Elements Timing Target = 2 x 7.45ns
Drive Out
Resource
Input Type
BooleanCounter INCounter RSTSTE
Boolean11442304
Counter123362304
STE108108366400
Reachable Elements Timing Target = 3 x 7.45ns
Drive Out
Resource
Input Type
BooleanCounter INCounter RSTSTE
Boolean10736366400
Counter108351006400
STE6400640010012544

Placement and Route

The AP hardware consists of repetitions of Boolean, Counter, and STE elements and a connectivity matrix connecting elements. Each physical hardware element has one output terminal and one or more input terminals. The connectivity matrix allows programmable routing of outputs to inputs.

During compilation a software model of the desired hardware configuration is created. This model includes the resource requirements, their configuration, and the I/O connectivity. Placement is the mapping from the resource elements to a physical location on the AP hardware.

The connectivity matrix allows I/O routing between physical resource locations. The connectivity matrix is hierarchical with horizontal and vertical routing channels with physical hardware resource I/O connects into, and out of, these channels. The routing channels have capacity constraints such as the number of physical I/O wires and the number of physical resources which connect to each channel. The routing constraints impact the ability to place connected resources nearby or on the same channel.

Due to the complexity of the problem, placement and routing are separated into distinct processes. The objective of the placement is to find an optimal mapping which uses the minimum number of physical hardware resources and which can be feasibly routed. The objective of routing is to allocate edges to route channels.

Counter Target Range

Counters are used to enumerate symbol recognition. A counter has a "target" count associated with it that once it is reached, it will perform some type of element activation depending on its mode. A counter target can be in the range 1 - 4194304, inclusive. Refer to the ANML Users Guide for a complete description of the 3 counter modes.

Programming Environment

The SDK is composed of four C programming interfaces: These are described in brief below:

  1. PCRE API
  2. ANML API
    • A set of functions for Programming with ANML.
    • header files: ap_anml.h
    • library: libapcompile
    • package: devel
  3. String Matching API
  4. Runtime API
    • A set of functions for communicating with Automata Processor hardware.
    • header files: ap_exec.h, ap_load.h
    • library: libapexec
    • package: base

Each programming interface has bindings to Python and Java.