org.apache.oro.text.regex
Class Perl5Compiler

java.lang.Object
  |
  +--org.apache.oro.text.regex.Perl5Compiler
All Implemented Interfaces:
PatternCompiler

public final class Perl5Compiler
extends Object
implements PatternCompiler

Safe: The Perl5Compiler class is used to create compiled regular expressions conforming to the Perl5 regular expression syntax. It generates Perl5Pattern instances upon compilation to be used in conjunction with a Perl5Matcher instance. Please see the user's guide for more information about Perl5 regular expressions.

Version:
$Id: Perl5Compiler.java,v 1.5 2001/12/02 06:01:41 markm Exp $
Author:
Daniel F. Savarese
See Also:
PatternCompiler, MalformedPatternException, Perl5Pattern, Perl5Matcher

Field Summary
private static char __CASE_INSENSITIVE
           
private  int __cost
           
private static char __EXTENDED
           
private static char __GLOBAL
           
private static String __HEX_DIGIT
           
private  CharStringPointer __input
           
private static char __KEEP
           
private static String __META_CHARS
           
private  char[] __modifierFlags
           
private static char __MULTILINE
           
private static int __NONNULL
           
private  int __numParentheses
           
private  char[] __program
           
private  int __programSize
           
private static char __READ_ONLY
           
private  boolean __sawBackreference
           
private static int __SIMPLE
           
private static char __SINGLELINE
           
private static int __SPSTART
           
private static int __TRYAGAIN
           
private static int __WORSTCASE
           
static int CASE_INSENSITIVE_MASK
          Enabled: A mask passed as an option to the compile methods to indicate a compiled regular expression should be case insensitive.
static int DEFAULT_MASK
          Enabled: The default mask for the compile methods.
static int EXTENDED_MASK
          Enabled: A mask passed as an option to the compile methods to indicate a compiled regular expression should be treated as a Perl5 extended pattern (i.e., a pattern using the /x modifier).
static int MULTILINE_MASK
          Enabled: A mask passed as an option to the compile methods to indicate a compiled regular expression should treat input as having multiple lines.
static int READ_ONLY_MASK
          Enabled: A mask passed as an option to the compile methods to indicate that the resulting Perl5Pattern should be treated as a read only data structure by Perl5Matcher, making it safe to share a single Perl5Pattern instance among multiple threads without needing synchronization.
static int SINGLELINE_MASK
          Enabled: A mask passed as an option to the compile methods to indicate a compiled regular expression should treat input as being a single line.
 
Constructor Summary
Perl5Compiler()
          Enabled:
 
Method Summary
private  int __emitArgNode(char operator, char arg)
           
private  void __emitCode(char code)
           
private  int __emitNode(char operator)
           
private  char __getNextChar()
           
private static boolean __isComplexRepetitionOp(char[] ch, int offset)
           
private static boolean __isSimpleRepetitionOp(char ch)
           
private  int __parseAlternation(int[] retFlags)
           
private  int __parseAtom(int[] retFlags)
           
private  int __parseBranch(int[] retFlags)
           
private  int __parseCharacterClass()
           
private  int __parseExpression(boolean isParenthesized, int[] hintFlags)
           
private static int __parseHex(char[] str, int offset, int maxLength, int[] scanned)
           
private static int __parseOctal(char[] str, int offset, int maxLength, int[] scanned)
           
private static boolean __parseRepetition(char[] str, int offset)
           
private  void __programAddOperatorTail(int current, int value)
           
private  void __programAddTail(int current, int value)
           
private  void __programInsertOperator(char operator, int operand)
           
private  void __setCharacterClassBits(char[] bits, int offset, char deflt, char ch)
           
private static void __setModifierFlag(char[] flags, char ch)
           
 Pattern compile(char[] pattern)
          Enabled: Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK);
 Pattern compile(char[] pattern, int options)
          Enabled: Compiles a Perl5 regular expression into a Perl5Pattern instance that can be used by a Perl5Matcher object to perform pattern matching.
 Pattern compile(String pattern)
          Enabled: Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK);
 Pattern compile(String pattern, int options)
          Enabled: Compiles a Perl5 regular expression into a Perl5Pattern instance that can be used by a Perl5Matcher object to perform pattern matching.
static String quotemeta(char[] expression)
          Enabled: Given a character string, returns a Perl5 expression that interprets each character of the original string literally.
static String quotemeta(String expression)
          Enabled: Given a character string, returns a Perl5 expression that interprets each character of the original string literally.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

__WORSTCASE

private static final int __WORSTCASE

__NONNULL

private static final int __NONNULL

__SIMPLE

private static final int __SIMPLE

__SPSTART

private static final int __SPSTART

__TRYAGAIN

private static final int __TRYAGAIN

__CASE_INSENSITIVE

private static final char __CASE_INSENSITIVE

__GLOBAL

private static final char __GLOBAL

__KEEP

private static final char __KEEP

__MULTILINE

private static final char __MULTILINE

__SINGLELINE

private static final char __SINGLELINE

__EXTENDED

private static final char __EXTENDED

__READ_ONLY

private static final char __READ_ONLY

__META_CHARS

private static final String __META_CHARS

__HEX_DIGIT

private static final String __HEX_DIGIT

__input

private CharStringPointer __input

__sawBackreference

private boolean __sawBackreference

__modifierFlags

private char[] __modifierFlags

__numParentheses

private int __numParentheses

__programSize

private int __programSize

__cost

private int __cost

__program

private char[] __program

DEFAULT_MASK

public static final int DEFAULT_MASK
Enabled: The default mask for the compile methods. It is equal to 0. The default behavior is for a regular expression to be case sensitive and to not specify if it is multiline or singleline. When MULITLINE_MASK and SINGLINE_MASK are not defined, the ^, $, and . metacharacters are interpreted according to the value of isMultiline() in Perl5Matcher. The default behavior of Perl5Matcher is to treat the Perl5Pattern as though MULTILINE_MASK were enabled. If isMultiline() returns false, then the pattern is treated as though SINGLINE_MASK were set. However, compiling a pattern with the MULTILINE_MASK or SINGLELINE_MASK masks will ALWAYS override whatever behavior is specified by the setMultiline() in Perl5Matcher.


CASE_INSENSITIVE_MASK

public static final int CASE_INSENSITIVE_MASK
Enabled: A mask passed as an option to the compile methods to indicate a compiled regular expression should be case insensitive.


MULTILINE_MASK

public static final int MULTILINE_MASK
Enabled: A mask passed as an option to the compile methods to indicate a compiled regular expression should treat input as having multiple lines. This option affects the interpretation of the ^ and $ metacharacters. When this mask is used, the ^ metacharacter matches at the beginning of every line, and the $ metacharacter matches at the end of every line. Additionally the . metacharacter will not match newlines when an expression is compiled with MULTILINE_MASK , which is its default behavior. The SINGLELINE_MASK and MULTILINE_MASK should not be used together.


SINGLELINE_MASK

public static final int SINGLELINE_MASK
Enabled: A mask passed as an option to the compile methods to indicate a compiled regular expression should treat input as being a single line. This option affects the interpretation of the ^ and $ metacharacters. When this mask is used, the ^ metacharacter matches at the beginning of the input, and the $ metacharacter matches at the end of the input. The ^ and $ metacharacters will not match at the beginning and end of lines occurring between the begnning and end of the input. Additionally, the . metacharacter will match newlines when an expression is compiled with SINGLELINE_MASK , unlike its default behavior. The SINGLELINE_MASK and MULTILINE_MASK should not be used together.


EXTENDED_MASK

public static final int EXTENDED_MASK
Enabled: A mask passed as an option to the compile methods to indicate a compiled regular expression should be treated as a Perl5 extended pattern (i.e., a pattern using the /x modifier). This option tells the compiler to ignore whitespace that is not backslashed or within a character class. It also tells the compiler to treat the # character as a metacharacter introducing a comment as in Perl. In other words, the # character will comment out any text in the regular expression between it and the next newline. The intent of this option is to allow you to divide your patterns into more readable parts. It is provided to maintain compatibility with Perl5 regular expressions, although it will not often make sense to use it in Java.


READ_ONLY_MASK

public static final int READ_ONLY_MASK
Enabled: A mask passed as an option to the compile methods to indicate that the resulting Perl5Pattern should be treated as a read only data structure by Perl5Matcher, making it safe to share a single Perl5Pattern instance among multiple threads without needing synchronization. Without this option, Perl5Matcher reserves the right to store heuristic or other information in Perl5Pattern that might accelerate future matches. When you use this option, Perl5Matcher will not store or modify any information in a Perl5Pattern. Use this option when you want to share a Perl5Pattern instance among multiple threads using different Perl5Matcher instances.

Constructor Detail

Perl5Compiler

public Perl5Compiler()
Enabled:

Method Detail

quotemeta

public static final String quotemeta(char[] expression)
Enabled: Given a character string, returns a Perl5 expression that interprets each character of the original string literally. In other words, all special metacharacters are quoted/escaped. This method is useful for converting user input meant for literal interpretation into a safe regular expression representing the literal input.

In effect, this method is the analog of the Perl5 quotemeta() builtin method.

Parameters:
expression - The expression to convert.
Returns:
A String containing a Perl5 regular expression corresponding to a literal interpretation of the pattern.

quotemeta

public static final String quotemeta(String expression)
Enabled: Given a character string, returns a Perl5 expression that interprets each character of the original string literally. In other words, all special metacharacters are quoted/escaped. This method is useful for converting user input meant for literal interpretation into a safe regular expression representing the literal input.

In effect, this method is the analog of the Perl5 quotemeta() builtin method.

Returns:
A String containing a Perl5 regular expression corresponding to a literal interpretation of the pattern.

__isSimpleRepetitionOp

private static boolean __isSimpleRepetitionOp(char ch)

__isComplexRepetitionOp

private static boolean __isComplexRepetitionOp(char[] ch,
                                               int offset)

__parseRepetition

private static boolean __parseRepetition(char[] str,
                                         int offset)

__parseHex

private static int __parseHex(char[] str,
                              int offset,
                              int maxLength,
                              int[] scanned)

__parseOctal

private static int __parseOctal(char[] str,
                                int offset,
                                int maxLength,
                                int[] scanned)

__setModifierFlag

private static void __setModifierFlag(char[] flags,
                                      char ch)

__emitCode

private void __emitCode(char code)

__emitNode

private int __emitNode(char operator)

__emitArgNode

private int __emitArgNode(char operator,
                          char arg)

__programInsertOperator

private void __programInsertOperator(char operator,
                                     int operand)

__programAddTail

private void __programAddTail(int current,
                              int value)

__programAddOperatorTail

private void __programAddOperatorTail(int current,
                                      int value)

__getNextChar

private char __getNextChar()

__parseAlternation

private int __parseAlternation(int[] retFlags)
                        throws MalformedPatternException
MalformedPatternException

__parseAtom

private int __parseAtom(int[] retFlags)
                 throws MalformedPatternException
MalformedPatternException

__setCharacterClassBits

private void __setCharacterClassBits(char[] bits,
                                     int offset,
                                     char deflt,
                                     char ch)

__parseCharacterClass

private int __parseCharacterClass()
                           throws MalformedPatternException
MalformedPatternException

__parseBranch

private int __parseBranch(int[] retFlags)
                   throws MalformedPatternException
MalformedPatternException

__parseExpression

private int __parseExpression(boolean isParenthesized,
                              int[] hintFlags)
                       throws MalformedPatternException
MalformedPatternException

compile

public Pattern compile(char[] pattern,
                       int options)
                throws MalformedPatternException
Enabled: Compiles a Perl5 regular expression into a Perl5Pattern instance that can be used by a Perl5Matcher object to perform pattern matching. Please see the user's guide for more information about Perl5 regular expressions.

Specified by:
compile in interface PatternCompiler
Parameters:
pattern - A Perl5 regular expression to compile.
options - A set of flags giving the compiler instructions on how to treat the regular expression. The flags are a logical OR of any number of the five MASK constants. For example:
 regex =
   compiler.compile(pattern, Perl5Compiler.
                    CASE_INSENSITIVE_MASK |
                    Perl5Compiler.MULTILINE_MASK);
                 
This says to compile the pattern so that it treats input as consisting of multiple lines and to perform matches in a case insensitive manner.
Returns:
A Pattern instance constituting the compiled regular expression. This instance will always be a Perl5Pattern and can be reliably casted to a Perl5Pattern.
MalformedPatternException

compile

public Pattern compile(char[] pattern)
                throws MalformedPatternException
Enabled: Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK);

Specified by:
compile in interface PatternCompiler
Parameters:
pattern - A regular expression to compile.
Returns:
A Pattern instance constituting the compiled regular expression. This instance will always be a Perl5Pattern and can be reliably casted to a Perl5Pattern.
MalformedPatternException

compile

public Pattern compile(String pattern)
                throws MalformedPatternException
Enabled: Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK);

Specified by:
compile in interface PatternCompiler
Parameters:
pattern - A regular expression to compile.
Returns:
A Pattern instance constituting the compiled regular expression. This instance will always be a Perl5Pattern and can be reliably casted to a Perl5Pattern.
MalformedPatternException

compile

public Pattern compile(String pattern,
                       int options)
                throws MalformedPatternException
Enabled: Compiles a Perl5 regular expression into a Perl5Pattern instance that can be used by a Perl5Matcher object to perform pattern matching. Please see the user's guide for more information about Perl5 regular expressions.

Specified by:
compile in interface PatternCompiler
Parameters:
pattern - A Perl5 regular expression to compile.
options - A set of flags giving the compiler instructions on how to treat the regular expression. The flags are a logical OR of any number of the five MASK constants. For example:
 regex =
   compiler.compile("^\\w+\\d+$",
                    Perl5Compiler.CASE_INSENSITIVE_MASK |
                    Perl5Compiler.MULTILINE_MASK);
                 
This says to compile the pattern so that it treats input as consisting of multiple lines and to perform matches in a case insensitive manner.
Returns:
A Pattern instance constituting the compiled regular expression. This instance will always be a Perl5Pattern and can be reliably casted to a Perl5Pattern.
MalformedPatternException


comments?