org.quasiliteral.syntax
Class BaseLexer

java.lang.Object
  |
  +--org.quasiliteral.syntax.BaseLexer
All Implemented Interfaces:
LexerFace, Marker, PassByProxy
Direct Known Subclasses:
ELexer, TermLexer

public abstract class BaseLexer
extends Object
implements LexerFace

Untamed: To be replaced with a lexer based on Antlr.

Abstracts out common elements of lexers usable within the E quasi-parsing framework.

Author:
Mark S. Miller

Field Summary
protected  AstroBuilder myBuilder
           
protected  char myChar
          the candidate character, or EOFCHAR for end-of-file.
protected  int myContinueCount
          Should the next line get extra indentation as a continuation line?
protected  boolean myDelayedNextChar
          Is there a nextChar() that's been delayed?
private  short myEolTok
           
private  short myEotluTok
           
protected  Indenter myIndenter
          Keeps track of indentation level
private  LineFeeder myInput
          contains all lines after the current line
 char[] myLData
          Enabled: the string part, but as an array for speed
 Twine myLTwine
          Enabled: the current line, or null at end-of-file
protected  boolean myNoTabsFlag
          Should tabs be rejected as valid whitespace?
protected  int myOptStartPos
          Where on the current line does the current token start? If the token starts before the current line, or if there is no current token, this is -1.
protected  Twine myOptStartText
          Accumulates all text of the current token from lines before the current line, or null if no current token or if the current token starts on the current line.
private  boolean myPartialFlag
           
protected  int myPos
          position in current line of candidate character
protected  boolean myQuasiFlag
          Should doubled '@', and '$' in literals be collapsed to singles?
 
Fields inherited from interface org.quasiliteral.syntax.LexerFace
EOFCHAR, EOFTOK
 
Fields inherited from interface org.erights.e.elib.serial.PassByProxy
HONORARY, HONORED_NAMES
 
Constructor Summary
protected BaseLexer(LineFeeder optLineFeeder, short eolTok, short eotluTok, boolean partialFlag, boolean quasiFlag, boolean noTabsFlag, AstroBuilder builder)
           
 
Method Summary
protected  int charConstant()
          Used to eat the encoding of a single character as it would appear inside a quoted character or string constant.
private  int charConstantInternal()
          Leaves myChar at the last character instead of the next one.
protected  Astro charLiteral()
          Assumes that myChar is the first single quote.
protected  Astro closeBracket()
           
 Astro composite(short tagCode, Object data, SourceSpan optSpan)
          Enabled:
private  boolean digits(int radix)
          If myChar isDigitStart(char, int) in base radix, then eat a sequence of isDigitPart(char, int)s in base radix.
protected  Astro docComment(short tagCode)
          Assumes the initial '/**' has already been eaten.
protected  SourceSpan endSpan()
           
protected  Twine endToken()
           
protected abstract  Astro getNextToken()
           
private  boolean isDigitPart(char c, int radix)
          Is c either a digit in base radix or an '_'?
private  boolean isDigitStart(char c, int radix)
          Is c a digit in base radix?
 boolean isEndOfFile()
          Enabled:
static boolean isJavaIdPart(char c)
          Enabled: Like java.lang.Character#isJavaIdentifierPart(char) but rejects EOFCHAR, which happens to be a '\0', which isJavaIdentifierPart accepts as an "ignorable control character".
static boolean isJavaIdStart(char c)
          Enabled: Like java.lang.Character#isJavaIdentifierStart(char)
protected  boolean isRestBlank(int start)
          Starting at start, is the rest of the current line "blank"?
protected  boolean isWhite(int start, int bound)
          Are all the characters on the current line from start inclusive to bound exclusive whitespace characters?
protected  Astro leafEOL()
          Output either an EOL or, if we're at top level, an EOTLU
protected  Astro leafTag(short tagCode, SourceSpan optSpan)
           
 void needMore(String msg)
          Enabled:
protected  void nextChar()
           
private  void nextLine()
           
 Astro nextToken()
          Enabled:
 Astro[] nextTopLevelUnit()
          Enabled:
protected  Astro numberLiteral()
          Note that E never calls this with a leading minus sign, prefering instead to treat the minus as an operator.
protected  Astro openBracket(char closer)
           
protected  Astro openBracket(short tagCode, Twine openner, char closer)
           
protected  char peekChar()
          XXX Get rid of peekChar/0 or make it work
protected  boolean peekChar(char c)
          Is the next character c?
 void reset()
          Enabled:
 void setSource(Twine newSource)
          Enabled:
protected  void skipLine()
          Skip the rest of this line.
protected  void skipWhiteSpace()
          Skips whitespace characters except for newlines.
protected  void startToken()
           
protected  void stopToken()
          Cancels a started token
protected  Astro stringLiteral()
          Assumes that myChar is the first double quote.
 void syntaxError(String msg)
          Enabled:
 String toString()
          Suppressed:
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

myInput

private LineFeeder myInput
contains all lines after the current line


myLTwine

public Twine myLTwine
Enabled: the current line, or null at end-of-file


myLData

public char[] myLData
Enabled: the string part, but as an array for speed


myPos

protected int myPos
position in current line of candidate character


myChar

protected char myChar
the candidate character, or EOFCHAR for end-of-file.


myOptStartPos

protected int myOptStartPos
Where on the current line does the current token start? If the token starts before the current line, or if there is no current token, this is -1.


myOptStartText

protected Twine myOptStartText
Accumulates all text of the current token from lines before the current line, or null if no current token or if the current token starts on the current line. The EOL token itself is not considered to be a line spanning token.


myDelayedNextChar

protected boolean myDelayedNextChar
Is there a nextChar() that's been delayed?


myEolTok

private final short myEolTok

myEotluTok

private final short myEotluTok

myPartialFlag

private final boolean myPartialFlag

myQuasiFlag

protected final boolean myQuasiFlag
Should doubled '@', and '$' in literals be collapsed to singles?


myNoTabsFlag

protected final boolean myNoTabsFlag
Should tabs be rejected as valid whitespace?


myIndenter

protected Indenter myIndenter
Keeps track of indentation level


myContinueCount

protected int myContinueCount
Should the next line get extra indentation as a continuation line?

If not, -1. If so, the number of spaces to indent.


myBuilder

protected final AstroBuilder myBuilder
Constructor Detail

BaseLexer

protected BaseLexer(LineFeeder optLineFeeder,
                    short eolTok,
                    short eotluTok,
                    boolean partialFlag,
                    boolean quasiFlag,
                    boolean noTabsFlag,
                    AstroBuilder builder)
             throws IOException
Parameters:
optLineFeeder -
eolTok - "End Of Line"
eotluTok - "End Of Top Level Unit"
partialFlag -
quasiFlag -
noTabsFlag -
builder -
Throws:
IOException
Method Detail

toString

public String toString()
Suppressed:

Overrides:
toString in class Object
Returns:
a string representation of the object.

setSource

public void setSource(Twine newSource)
Enabled:

Specified by:
setSource in interface LexerFace

reset

public void reset()
Enabled:

Specified by:
reset in interface LexerFace

nextLine

private void nextLine()
               throws IOException
IOException

nextChar

protected void nextChar()
                 throws IOException
IOException

nextTopLevelUnit

public Astro[] nextTopLevelUnit()
                         throws IOException,
                                SyntaxException
Enabled:

Specified by:
nextTopLevelUnit in interface LexerFace
IOException
SyntaxException

nextToken

public Astro nextToken()
                throws IOException,
                       SyntaxException
Enabled:

Specified by:
nextToken in interface LexerFace
IOException
SyntaxException

syntaxError

public void syntaxError(String msg)
                 throws SyntaxException
Enabled:

Specified by:
syntaxError in interface LexerFace
SyntaxException

needMore

public void needMore(String msg)
              throws NeedMoreException,
                     SyntaxException
Enabled:

Specified by:
needMore in interface LexerFace
NeedMoreException
SyntaxException

isDigitStart

private boolean isDigitStart(char c,
                             int radix)
Is c a digit in base radix?


isDigitPart

private boolean isDigitPart(char c,
                            int radix)
Is c either a digit in base radix or an '_'?


digits

private boolean digits(int radix)
                throws IOException
If myChar isDigitStart(char, int) in base radix, then eat a sequence of isDigitPart(char, int)s in base radix.
     <digits(n)> ::= <digitStart(n)
                               > <digitPart(n)>*
 

IOException

skipWhiteSpace

protected void skipWhiteSpace()
                       throws IOException
Skips whitespace characters except for newlines.

IOException

isWhite

protected boolean isWhite(int start,
                          int bound)
Are all the characters on the current line from start inclusive to bound exclusive whitespace characters?


isRestBlank

protected boolean isRestBlank(int start)
Starting at start, is the rest of the current line "blank"?

This just defaults to isWhite(start,myLData.length), but should be overridden by subclasses to also consider the remainder of a line blank if the only thing it contains is a rest-of-line comment (such as a "#" or "//" comment in E).


isJavaIdStart

public static boolean isJavaIdStart(char c)
Enabled: Like java.lang.Character#isJavaIdentifierStart(char)


isJavaIdPart

public static boolean isJavaIdPart(char c)
Enabled: Like java.lang.Character#isJavaIdentifierPart(char) but rejects EOFCHAR, which happens to be a '\0', which isJavaIdentifierPart accepts as an "ignorable control character".


leafTag

protected Astro leafTag(short tagCode,
                        SourceSpan optSpan)

leafEOL

protected Astro leafEOL()
Output either an EOL or, if we're at top level, an EOTLU


composite

public Astro composite(short tagCode,
                       Object data,
                       SourceSpan optSpan)
Enabled:

Specified by:
composite in interface LexerFace

getNextToken

protected abstract Astro getNextToken()
                               throws IOException,
                                      SyntaxException
IOException
SyntaxException

openBracket

protected Astro openBracket(char closer)
                     throws IOException
IOException

openBracket

protected Astro openBracket(short tagCode,
                            Twine openner,
                            char closer)

closeBracket

protected Astro closeBracket()
                      throws IOException
IOException

charConstantInternal

private int charConstantInternal()
                          throws IOException,
                                 SyntaxException
Leaves myChar at the last character instead of the next one.

Returns:
Throws:
IOException
SyntaxException

charConstant

protected int charConstant()
                    throws IOException,
                           SyntaxException
Used to eat the encoding of a single character as it would appear inside a quoted character or string constant.

Backslash escapes are interpreted according to the Java standard, except that escaped character codes are not yet implemented.

If we're quasi-parsing, then any literal '$' or '@' characters must appear doubled, in which case a single such character will be included in the literal.

Returns:
encoded character, or -1 if a backslash-newline was seen, since this encodes no characters.
IOException
SyntaxException

charLiteral

protected Astro charLiteral()
                     throws IOException,
                            SyntaxException
Assumes that myChar is the first single quote.
 <charLiteral> ::= "'" <charConstant()> "'"
 

IOException
SyntaxException

stringLiteral

protected Astro stringLiteral()
                       throws IOException,
                              SyntaxException
Assumes that myChar is the first double quote.
 <stringLiteral> ::= '"' <charConstant()>* '"'
 

IOException
SyntaxException

docComment

protected Astro docComment(short tagCode)
                    throws IOException,
                           SyntaxException
Assumes the initial '/**' has already been eaten.

The docComment syntax is as documented in the Java Language Specification.

IOException
SyntaxException

numberLiteral

protected Astro numberLiteral()
                       throws IOException,
                              SyntaxException
Note that E never calls this with a leading minus sign, prefering instead to treat the minus as an operator.
     <numberLiteral> ::= "-"? "0x" <digits(16)>
     |                   "-"? "0" <digitPart(8)>*
     |                   "-"? <digits(10)>
                             ("." <digits(10)>)?
                             (("e"|"E") "-"? <digits(10)>)?
 
A floating point number must have at least a "." or a ("e"|"E"). A leading "0" on a floating point number doesn't affect the base. A leading "0" on an integer means octal (base 8).

IOException
SyntaxException

peekChar

protected char peekChar()
XXX Get rid of peekChar/0 or make it work


peekChar

protected boolean peekChar(char c)
Is the next character c?


skipLine

protected void skipLine()
Skip the rest of this line.


startToken

protected void startToken()

stopToken

protected void stopToken()
Cancels a started token


endToken

protected Twine endToken()

endSpan

protected SourceSpan endSpan()

isEndOfFile

public boolean isEndOfFile()
Enabled:

Specified by:
isEndOfFile in interface LexerFace


comments?