PCRE Programmers Reference

Building the Expression Database

The PCRE API allows the programmer to parse regular expressions into a machine readable expression database. An expression database is accessed by the opaque reference ap_exprdb_t. Expressions are added to the database by calling AP_AddExpression(). The expression can be modified during an add by applying expression modifiers.

The expression database is compiled into an automaton by calling AP_Compile() and saved to disk by calling AP_Save(). The disk file can be loaded into the Automata Processor and executed against data to perform a search.

During compilation, expressions are prefixed according the AP_MOD_MULTILINE modifier and left anchor presence. Expressions are prefixed based on the following rules.

  • Multiline without left anchor: /.*/
  • Multiline with left anchor: /(.*\n)?/
  • Not-Multiline without left anchor: /.*/
  • Not-Multiline with left anchor: No prefix.

A multiline expression is specified by the AP_MOD_MULTILINE modifier. The operation is equivalent to the PCRE multiline modifier.

Compiling Regular Expressions

The default compilation process applies optimizations so as to minimize STE resource usage within the Automata Processor, while maintaining the ability to identify each expression which can cause a match. The compilation process can be controlled by passing compiler options to AP_Compile().

The compiler option AP_OPT_MERGE_IF_SAME_REPORT_CODE enables an aggressive optimization of expression redundancy by merging expressions. This is typically undesirable since it removes the ability to determine which expression caused a match. It may be useful to applications which have large expression databases and only require match granularity at the Automaton level. Applications such as web filtering may use this option if each category (games, adult, etc...) is assigned to an automaton.

The compiler option AP_OPT_DISABLE_COUNTERS disables use of hardware-quantifiers within the Automata Processor. Normally the compilation process attempts to replace quantifications with hardware counters so as to minimize STE resource usage. This option disables that optimization.

The example below shows how to build an expression database and compile into an automaton. This simple example does not handle error returns which would be required in a real application.

Example: Create an automaton for expressions /a(b|c)*[de]/i and /x.*yz/i.

#include <micron/ap/ap_compile.h>
ap_automaton_t CreateAutomaton(void)
{
// Create an expression database
// Add an expression with PCRE delimiters and set the i PCRE modifier (caseless).
// The expression identifier is set to 1
AP_AddExpression(db, NULL, "/a(b|c)*[de]/i", 1, 0, AP_GRAMMAR_PCRE_DELIMITED);
// Add an expression without PCRE delimiters and set to caseless matching
// The expression identifier is set to 1
// Compile the expression database into a automaton
AP_Compile(db, &amton, 0, NULL, 0, 0);
// Clean up
return amton;
}
1 from micronap.sdk import *
2 
3 def CreateAutomaton():
4  '''Create an expression database
5 
6  Returns:
7  A micronap.sdk.common.Automaton instance.
8 
9  Raises:
10  micronap.sdk.common.ApError
11  '''
12  db = ExpressionDB()
13 
14  # Add an expression with PCRE delimiters (default) and set the i PCRE modifier (caseless).
15  # The expression identifier is set to 1
16  db.AddExpression('/a(b|c)*[de]/i', 1, 0)
17 
18  # Add an expression without PCRE delimiters and set to caseless matching
19  # The expression identifier is set to 1
20  db.AddExpression('x.*yz', 1, ExprMods.AP_MOD_CASELESS, 0)
21 
22  # Compile the expression database into a automaton
23  A = db.Compile()
24  return A
import com.micron.ap.*;
public class CreateAutomatonExample {
/*
* Create an expression database.
* @return A com.micron.ap.Automaton instance.
* @throws micron.com.ap.ApException
*/
public static Automaton CreateAutomaton() throws ApException {
ExpressionDB db = new ExpressionDB();
/* Add an expression with PCRE delimiters (default) and set the i PCRE modifier (caseless).
* The expression identifier is set to 1
*/
db.addExpression("/a(b|c)*[de]/i", 1);
/* Add an expression without PCRE delimiters and set to caseless matching
* The expression identifier is set to 1
*/
db.addExpression("x.*yz", 1, ExprMods.AP_MOD_CASELESS, 0);
/* Compile the expression database into a automaton */
Automaton A = db.compile(0);
return A;
}
}