public class Token
extends java.lang.Object
Instances of this class are created by the classes implementing the
Tokenizer
interface. Token
describes a portion of text
according to the settings given to the producing Tokenizer
in form of
a TokenizerProperties
object. Beside the token type the token image
itself, its position in the input stream, line and column position and associated
informations can be obtained from the Token
(provided, the nessecary
parse flags are set in the tokenizer).
This class replaces the older de.susebox.java.util.Token
which is
deprecated.
Tokenizer
,
TokenizerProperties
Modifier and Type | Field and Description |
---|---|
static byte |
BLOCK_COMMENT
Block comments are also a special form of a whitespace sequence.
|
static byte |
EOF
A token of the type
EOF is used to indicate an end-of-line condition
on the input stream of the tokenizer. |
static byte |
KEYWORD
The token is a keyword registered with the used
Tokenizer . |
static byte |
LINE_COMMENT
Although a line comment is - in most cases - actually a whitespace sequence, it
is often nessecary to handle it separately.
|
static byte |
NORMAL
The token is nothing special (no keyword, no whitespace, etc.).
|
static byte |
PATTERN
The token matches a pattern.
|
static byte |
SEPARATOR
Separators are otherwise not remarkable characters.
|
static byte |
SPECIAL_SEQUENCE
Special sequences are characters or character combinations that have a certain
meaning to the parsed language or dialect.
|
static byte |
STRING
The token is one of the quoted strings known to the
Tokenizer . |
static byte |
UNKNOWN
This is for the leftovers of the lexical analysis of a text.
|
static byte |
WHITESPACE
Whitespaces are portions of the text, that contain one or more characters
that separate the significant parts of the text.
|
Constructor and Description |
---|
Token()
Default constructor.
|
Token(int type)
Constructs a token of a given type.
|
Token(int type,
java.lang.String image)
Construct a token of a given type with the given image.
|
Token(int type,
java.lang.String image,
java.lang.Object companion)
Construct a token of a given type with the given image and a companion.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(java.lang.Object object)
Implementation of the well known method
Object.equals(java.lang.Object) . |
java.lang.Object |
getCompanion()
Obtaining the associated information of the token.
|
int |
getEndColumn()
Obtaining the column number where the
Token ends. |
int |
getEndLine()
Obtaining the line number where the token ends.
|
int |
getEndPosition()
Obtaining the end position of this token.
|
java.lang.String |
getImage()
Obtaining the token image as a
String . |
java.lang.String[] |
getImageParts()
Image parts are substrings of a token image.
|
int |
getLength()
Obtaining the length of the token.
|
int |
getStartColumn()
Obtaining the column number of the
Token start. |
int |
getStartLine()
Obtaining the line number where the
Token starts. |
int |
getStartPosition()
Obtaining the starting position of the token.
|
int |
getType()
Obtaining the type of the
Token . |
static java.lang.String |
getTypeName(int type)
Getting a type name for displaying.
|
void |
setCompanion(java.lang.Object companion)
Some token may have associated informations for the user of the
Token . |
void |
setEndColumn(int colno)
In
Tokenizer 's counting lines and columns, this method is used to set the
column number where the end of the Token was found.The end column number is the one of the first character that does NOT belongs to the token. |
void |
setEndLine(int lineno)
In
Tokenizer 's counting lines and columns, this method is used to
set the line number where the end of the Token was found. |
void |
setEndPosition(int endPosition)
Setting the end position of the token relative to the start of the input
stream.
|
void |
setImage(java.lang.String image)
Setting the token image.
|
void |
setImageParts(java.lang.String[] imageParts)
The counterpart to
getImageParts() . |
void |
setLength(int length)
Setting the length of the token.
|
void |
setStartColumn(int colno)
In
Tokenizer 's counting lines and columns, this method is used to
set the column number where the beginning of the Token was
found. |
void |
setStartLine(int lineno)
In
Tokenizer 's counting lines and columns, this method is used to
set the line number where the beginning of the Token was found. |
void |
setStartPosition(int startPosition)
Setting the start position of the token relative to the start of the input
stream.
|
void |
setType(int type)
Setting the type property of the
Token . |
java.lang.String |
toString()
Implementation of the well known method
Object.toString() . |
public static final byte NORMAL
public static final byte KEYWORD
Tokenizer
.public static final byte STRING
Tokenizer
. In Java
this would be for instance a "String" or a 'c' (haracter).public static final byte PATTERN
public static final byte SPECIAL_SEQUENCE
Token
.public static final byte SEPARATOR
public static final byte WHITESPACE
public static final byte LINE_COMMENT
public static final byte BLOCK_COMMENT
LINE_COMMENT
for details.public static final byte EOF
EOF
is used to indicate an end-of-line condition
on the input stream of the tokenizer.public static final byte UNKNOWN
public Token()
public Token(int type)
type
- token type, one of the class constants.public Token(int type, java.lang.String image)
type
- token type, one of the class constants.image
- the token image itselfpublic Token(int type, java.lang.String image, java.lang.Object companion)
type
- token type, one of the class constants.image
- the token image itselfcompanion
- an associated information of the token typepublic void setType(int type)
Token
. This is one of the constants
defined in this class.type
- the token typegetType()
public int getType()
Token
. This is one of the constants
defined in the Token
class.setType(int)
public void setImage(java.lang.String image)
Tokenizer
only fill position
and length information rather than setting the token image. This strategy
might have a tremendous influence on the parse performance and the memory
allocation.image
- the token imagegetImage()
public java.lang.String getImage()
String
. Th method returns
null
when called on an end-of-file token or if the Tokenizer
producing this Token
object, is configured to return only
position informations (see TokenizerProperties#F_TOKEN_POS_ONLY
).String
(null
is possible).setImage(java.lang.String)
public java.lang.String[] getImageParts()
TokenizerProperties#F_RETURN_IMAGE_PARTS
is
set for the TokenizerProperties
, the Tokenizer
or the
TokenizerProperty
that "produced" the token. If that flag is not set
the return value is identical to getImage()
.
NORMAL
, KEYWORD
, SPECIAL_SEQUENCE
,
SEPARATOR
: These token have one image part that is identical to
the image itself (getImage()
).
WHITESPACE
: Whitespaces have one image part for each substring
on a single line without any line separators. For whitespace sequences
without line separators there will be one part that is identical to the
image itself (getImage()
). More generally, whitespaces have
separatorCount + 1
image parts. For multi-line whitespaces
some or all of these image parts can be empty.
STRING
: One image part per line containing the characters between
and excluding the string start and end sequences and/or the line
separators, equivalent to the handling of whitespaces. The string escape
sequences are resolved. For instance, the image part of the SQL string
'select ''hello'' from dual'
is select 'hello' from dual
.
Multiline strings may have empty image parts (if emtpy lines are included
in the string). The string "line1\n" has two image parts: "line1" and the
empty string (since the string ends on a new line). The string "\nline2"
has also two image parts: the empty string and "line2" (since the string
starts on one line and ends on the next).
PATTERN
: a pattern has image parts according to the groups defined
in the regular expression of the pattern. The Pattern
class speaks of "Capturing groups" that are expressions in parentheses.
Image parts are especially important for pattern token, where the access
to parts of the pattern is usually nessecary. For instance, in Java Unicode
characters can be written in form of "\\u[0-9A-Fa-f]{4}"
pattern. For further processing the hexadecimal part must be accessed.
By using the pattern "\\u([0-9A-Fa-f]{4})"
, a token containing
the unicode notation "\\u00AC"
has the two image parts
"\\u00AC"
(capturing group 0) and "00AC"
(capturing group 1).
LINE_COMMENT
: Line comments have one image part that contains
the substring after the line comment start sequence up to and excluding
the line separator sequence.
BLOCK_COMMENT
: Like whitespaces and string, block comments have
one image part per line they are spanning. The first part is without the
block comment start sequence, the last without the block comment end
sequence. The line separator sequences are also not included in the parts.
EOF
: The method returns an empty array.
Enumeration
or Iterator
, since it can be used more easily and contains
only one element in a lot if not most cases.TokenizerProperties#F_RETURN_IMAGE_PARTS
is set or containing
the image itself otherwise (getImage()
).public void setImageParts(java.lang.String[] imageParts)
getImageParts()
. It sets all image parts in one
operation. The method accepts null
and empty arrays.imageParts
- an array of image parts according to the token type or
null
public void setLength(int length)
Tokenizer
may prefer or may be
configured not to return a token image, but only the position and length
informations. This may save a lot of time whereever only a subset of the found
tokens are actually needed by the user.
setEndPosition(int)
depending on which
information is at hand or easier to obtain for the Tokenizer
producing
this Token
.
setImage(java.lang.String)
and
setEndPosition(int)
.length
- the length of the tokengetLength()
,
setEndPosition(int)
public int getLength()
setLength(int)
,
getEndPosition()
public void setCompanion(java.lang.Object companion)
Token
.
A popular thing would be the association of an integer constant to a special
sequence or keyword to be used in fast switch
statetents.companion
- the associated information for this tokenpublic java.lang.Object getCompanion()
null
. See
setCompanion(java.lang.Object)
for details.public void setStartPosition(int startPosition)
startPosition
- the position where the token starts in the input stream.getStartPosition()
,
setEndPosition(int)
public int getStartPosition()
setStartPosition(int)
,
getEndPosition()
public void setEndPosition(int endPosition)
Token
. This is the same principle as in the
String.substring(int, int)
method.
setLength(int)
depending on which
information is at hand or easier to obtain for the Tokenizer
producing
this Token
.
setStartPosition(int)
since it affects the length of the token. Its effect is in turn eliminated
by calls to setLength(int)
and setImage(java.lang.String)
endPosition
- the position where the token ends in the input stream.public int getEndPosition()
setStartPosition(int)
has been called and one
of the methods setImage(java.lang.String)
, setLength(int)
or setEndPosition(int)
.setEndPosition(int)
,
setStartPosition(int)
,
getStartPosition()
public void setStartLine(int lineno)
Tokenizer
's counting lines and columns, this method is used to
set the line number where the beginning of the Token
was found.
Line numbers start with 0.lineno
- line number where the token beginsgetStartLine()
public int getStartLine()
Token
starts. See also
setStartLine(int)
for details.setStartLine(int)
public void setStartColumn(int colno)
Tokenizer
's counting lines and columns, this method is used to
set the column number where the beginning of the Token
was
found. Column numbers start with 0.colno
- number where the token beginsgetStartColumn()
public int getStartColumn()
Token
start. See setStartColumn(int)
for details.setStartColumn(int)
public void setEndLine(int lineno)
Tokenizer
's counting lines and columns, this method is used to
set the line number where the end of the Token
was found.
See setStartLine(int)
for more.String.substring(int, int)
.lineno
- line number where the token endspublic int getEndLine()
setEndLine(int)
for
more. If a tokenizer doesn't count lines and columns, the returned value is
-1.setEndLine(int)
public void setEndColumn(int colno)
Tokenizer
's counting lines and columns, this method is used to set the
column number where the end of the Token
was found.String.substring(int, int)
.colno
- column number where the token endspublic int getEndColumn()
Token
ends. See setEndColumn(int)
for more.setEndColumn(int)
public boolean equals(java.lang.Object object)
Object.equals(java.lang.Object)
.
Note that two token are equal if every member of it is equal. That means
that token retrieved by two different Tokenizer
instances can be
equal.equals
in class java.lang.Object
object
- the Object
to comparetrue
if two token are equal, false
otherwisepublic java.lang.String toString()
Object.toString()
.toString
in class java.lang.Object
public static java.lang.String getTypeName(int type)
type
- one of the Token type constants