Token

java.lang.Object
- de.susebox.jtopas.Token

```
public class Token
extends java.lang.Object
```
Instances of this class are created by the classes implementing the Tokenizer interface. Token describes a portion of text according to the settings given to the producing Tokenizer in form of a TokenizerProperties object. Beside the token type the token image itself, its position in the input stream, line and column position and associated informations can be obtained from the Token (provided, the nessecary parse flags are set in the tokenizer).
This class replaces the older de.susebox.java.util.Token which is deprecated.

Author:

Heiko Blau

See Also:

Tokenizer, TokenizerProperties

Field Summary

Fields
Modifier and Type	Field and Description
`static byte`	`BLOCK_COMMENT` Block comments are also a special form of a whitespace sequence.
`static byte`	`EOF` A token of the type `EOF` is used to indicate an end-of-line condition on the input stream of the tokenizer.
`static byte`	`KEYWORD` The token is a keyword registered with the used `Tokenizer`.
`static byte`	`LINE_COMMENT` Although a line comment is - in most cases - actually a whitespace sequence, it is often nessecary to handle it separately.
`static byte`	`NORMAL` The token is nothing special (no keyword, no whitespace, etc.).
`static byte`	`PATTERN` The token matches a pattern.
`static byte`	`SEPARATOR` Separators are otherwise not remarkable characters.
`static byte`	`SPECIAL_SEQUENCE` Special sequences are characters or character combinations that have a certain meaning to the parsed language or dialect.
`static byte`	`STRING` The token is one of the quoted strings known to the `Tokenizer`.
`static byte`	`UNKNOWN` This is for the leftovers of the lexical analysis of a text.
`static byte`	`WHITESPACE` Whitespaces are portions of the text, that contain one or more characters that separate the significant parts of the text.

Constructor Summary

Constructors
Constructor and Description
`Token()` Default constructor.
`Token(int type)` Constructs a token of a given type.
`Token(int type, java.lang.String image)` Construct a token of a given type with the given image.
`Token(int type, java.lang.String image, java.lang.Object companion)` Construct a token of a given type with the given image and a companion.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`equals(java.lang.Object object)` Implementation of the well known method `Object.equals(java.lang.Object)`.
`java.lang.Object`	`getCompanion()` Obtaining the associated information of the token.
`int`	`getEndColumn()` Obtaining the column number where the `Token` ends.
`int`	`getEndLine()` Obtaining the line number where the token ends.
`int`	`getEndPosition()` Obtaining the end position of this token.
`java.lang.String`	`getImage()` Obtaining the token image as a `String`.
`java.lang.String[]`	`getImageParts()` Image parts are substrings of a token image.
`int`	`getLength()` Obtaining the length of the token.
`int`	`getStartColumn()` Obtaining the column number of the `Token` start.
`int`	`getStartLine()` Obtaining the line number where the `Token` starts.
`int`	`getStartPosition()` Obtaining the starting position of the token.
`int`	`getType()` Obtaining the type of the `Token`.
`static java.lang.String`	`getTypeName(int type)` Getting a type name for displaying.
`void`	`setCompanion(java.lang.Object companion)` Some token may have associated informations for the user of the `Token`.
`void`	`setEndColumn(int colno)` In `Tokenizer`'s counting lines and columns, this method is used to set the column number where the end of the `Token` was found. The end column number is the one of the first character that does *NOT* belongs to the token.
`void`	`setEndLine(int lineno)` In `Tokenizer`'s counting lines and columns, this method is used to set the line number where the end of the `Token` was found.
`void`	`setEndPosition(int endPosition)` Setting the end position of the token relative to the start of the input stream.
`void`	`setImage(java.lang.String image)` Setting the token image.
`void`	`setImageParts(java.lang.String[] imageParts)` The counterpart to `getImageParts()`.
`void`	`setLength(int length)` Setting the length of the token.
`void`	`setStartColumn(int colno)` In `Tokenizer`'s counting lines and columns, this method is used to set the column number where the beginning of the `Token` was found.
`void`	`setStartLine(int lineno)` In `Tokenizer`'s counting lines and columns, this method is used to set the line number where the beginning of the `Token` was found.
`void`	`setStartPosition(int startPosition)` Setting the start position of the token relative to the start of the input stream.
`void`	`setType(int type)` Setting the type property of the `Token`.
`java.lang.String`	`toString()` Implementation of the well known method `Object.toString()`.

Methods inherited from class java.lang.Object
getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - NORMAL
```
public static final byte NORMAL
```
    The token is nothing special (no keyword, no whitespace, etc.).
    
    See Also:
    
    Constant Field Values
  - KEYWORD
```
public static final byte KEYWORD
```
    The token is a keyword registered with the used Tokenizer.
    
    See Also:
    
    Constant Field Values
  - STRING
```
public static final byte STRING
```
    The token is one of the quoted strings known to the Tokenizer. In Java this would be for instance a "String" or a 'c' (haracter).
    
    See Also:
    
    Constant Field Values
  - PATTERN
```
public static final byte PATTERN
```
    The token matches a pattern. This can be a number od identifier pattern for instance.
    
    See Also:
    
    Constant Field Values
  - SPECIAL_SEQUENCE
```
public static final byte SPECIAL_SEQUENCE
```
    Special sequences are characters or character combinations that have a certain meaning to the parsed language or dialect. In computer languages we have for instance operators, end-of-statement characters etc. A companion might have been associated with a special sequence. It probably contains information important to the user of the Token.
    
    See Also:
    
    Constant Field Values
  - SEPARATOR
```
public static final byte SEPARATOR
```
    Separators are otherwise not remarkable characters. An opening parenthesis might be nessecary for a syntactically correct text, but without any special meaning to the compiler, interpreter etc. after it has been detected.
    
    See Also:
    
    Constant Field Values
  - WHITESPACE
```
public static final byte WHITESPACE
```
    Whitespaces are portions of the text, that contain one or more characters that separate the significant parts of the text. Generally, a sequence of whitespaces is equally represented by one single whitespace character. That is the difference to separators.
    
    See Also:
    
    Constant Field Values
  - LINE_COMMENT
```
public static final byte LINE_COMMENT
```
    Although a line comment is - in most cases - actually a whitespace sequence, it is often nessecary to handle it separately. Syntax hilighting is a thing that needs to know a line comment.
    
    See Also:
    
    Constant Field Values
  - BLOCK_COMMENT
```
public static final byte BLOCK_COMMENT
```
    Block comments are also a special form of a whitespace sequence. See LINE_COMMENT for details.
    
    See Also:
    
    Constant Field Values
  - EOF
```
public static final byte EOF
```
    A token of the type EOF is used to indicate an end-of-line condition on the input stream of the tokenizer.
    
    See Also:
    
    Constant Field Values
  - UNKNOWN
```
public static final byte UNKNOWN
```
    This is for the leftovers of the lexical analysis of a text.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - Token
```
public Token()
```
    Default constructor.
  - Token
```
public Token(int type)
```
    Constructs a token of a given type. Only the type of the token is known but not its image or positions.
    
    Parameters:
    
    type - token type, one of the class constants.
  - Token
```
public Token(int type,
             java.lang.String image)
```
    Construct a token of a given type with the given image. No position information is given.
    
    Parameters:
    
    type - token type, one of the class constants.
    
    image - the token image itself
  - Token
```
public Token(int type,
             java.lang.String image,
             java.lang.Object companion)
```
    Construct a token of a given type with the given image and a companion. This constructor is most useful for keywords or special sequences.
    
    Parameters:
    
    type - token type, one of the class constants.
    
    image - the token image itself
    
    companion - an associated information of the token type
- Method Detail
  - setType
```
public void setType(int type)
```
    Setting the type property of the Token. This is one of the constants defined in this class.
    
    Parameters:
    
    type - the token type
    
    See Also:
    
    getType()
  - getType
```
public int getType()
```
    Obtaining the type of the Token. This is one of the constants defined in the Token class.
    
    Returns:
    
    the token type
    
    See Also:
    
    setType(int)
  - setImage
```
public void setImage(java.lang.String image)
```
    Setting the token image. Note that some Tokenizer only fill position and length information rather than setting the token image. This strategy might have a tremendous influence on the parse performance and the memory allocation.
    
    Parameters:
    
    image - the token image
    
    See Also:
    
    getImage()
  - getImage
```
public java.lang.String getImage()
```
    Obtaining the token image as a String. Th method returns null when called on an end-of-file token or if the Tokenizer producing this Token object, is configured to return only position informations (see TokenizerProperties#F_TOKEN_POS_ONLY).
    
    Returns:
    
    the token image as a String (null is possible).
    
    See Also:
    
    setImage(java.lang.String)
  - getImageParts
```
public java.lang.String[] getImageParts()
```
    Image parts are substrings of a token image. The operation returns a meaningful result only, if the flag TokenizerProperties#F_RETURN_IMAGE_PARTS is set for the TokenizerProperties, the Tokenizer or the TokenizerProperty that "produced" the token. If that flag is not set the return value is identical to getImage().
    Number and contents of the image parts depend on the token type:
    - NORMAL, KEYWORD, SPECIAL_SEQUENCE, SEPARATOR: These token have one image part that is identical to the image itself (getImage()).
    - WHITESPACE: Whitespaces have one image part for each substring on a single line without any line separators. For whitespace sequences without line separators there will be one part that is identical to the image itself (getImage()). More generally, whitespaces have separatorCount + 1 image parts. For multi-line whitespaces some or all of these image parts can be empty.
    - STRING: One image part per line containing the characters between and excluding the string start and end sequences and/or the line separators, equivalent to the handling of whitespaces. The string escape sequences are resolved. For instance, the image part of the SQL string 'select ''hello'' from dual' is select 'hello' from dual. Multiline strings may have empty image parts (if emtpy lines are included in the string). The string "line1\n" has two image parts: "line1" and the empty string (since the string ends on a new line). The string "\nline2" has also two image parts: the empty string and "line2" (since the string starts on one line and ends on the next).
    - PATTERN: a pattern has image parts according to the groups defined in the regular expression of the pattern. The Pattern class speaks of "Capturing groups" that are expressions in parentheses. Image parts are especially important for pattern token, where the access to parts of the pattern is usually nessecary. For instance, in Java Unicode characters can be written in form of "\\u[0-9A-Fa-f]{4}" pattern. For further processing the hexadecimal part must be accessed. By using the pattern "\\u([0-9A-Fa-f]{4})", a token containing the unicode notation "\\u00AC" has the two image parts "\\u00AC" (capturing group 0) and "00AC" (capturing group 1).
    - LINE_COMMENT: Line comments have one image part that contains the substring after the line comment start sequence up to and excluding the line separator sequence.
    - BLOCK_COMMENT: Like whitespaces and string, block comments have one image part per line they are spanning. The first part is without the block comment start sequence, the last without the block comment end sequence. The line separator sequences are also not included in the parts.
    - EOF: The method returns an empty array.
    The return value is an array of strings rather than an Enumeration or Iterator, since it can be used more easily and contains only one element in a lot if not most cases.
    Returns:
    
    an array of image parts according to the token type if the flag TokenizerProperties#F_RETURN_IMAGE_PARTS is set or containing the image itself otherwise (getImage()).
  - setImageParts
```
public void setImageParts(java.lang.String[] imageParts)
```
    The counterpart to getImageParts(). It sets all image parts in one operation. The method accepts null and empty arrays.
    
    Parameters:
    
    imageParts - an array of image parts according to the token type or null
  - setLength
```
public void setLength(int length)
```
    Setting the length of the token. Some Tokenizer may prefer or may be configured not to return a token image, but only the position and length informations. This may save a lot of time whereever only a subset of the found tokens are actually needed by the user.
    This method is an alternative to setEndPosition(int) depending on which information is at hand or easier to obtain for the Tokenizer producing this Token.
    Note that this method is implicitely called by setImage(java.lang.String) and setEndPosition(int).
    
    Parameters:
    
    length - the length of the token
    
    See Also:
    
    getLength(), setEndPosition(int)
  - getLength
```
public int getLength()
```
    Obtaining the length of the token. Note that some token types have a zero length (like EOF or UNKNOWN).
    
    Returns:
    
    the length of the token.
    
    See Also:
    
    setLength(int), getEndPosition()
  - setCompanion
```
public void setCompanion(java.lang.Object companion)
```
    Some token may have associated informations for the user of the Token. A popular thing would be the association of an integer constant to a special sequence or keyword to be used in fast switch statetents.
    
    Parameters:
    
    companion - the associated information for this token
  - getCompanion
```
public java.lang.Object getCompanion()
```
    Obtaining the associated information of the token. Can be null. See setCompanion(java.lang.Object) for details.
    
    Returns:
    
    the associated information of this token
  - setStartPosition
```
public void setStartPosition(int startPosition)
```
    Setting the start position of the token relative to the start of the input stream. For instance, the first character in a file has the start position 0.
    
    Parameters:
    
    startPosition - the position where the token starts in the input stream.
    
    See Also:
    
    getStartPosition(), setEndPosition(int)
  - getStartPosition
```
public int getStartPosition()
```
    Obtaining the starting position of the token. If not set or not of interest, -1 is returned.
    
    Returns:
    
    start position of the token.
    
    See Also:
    
    setStartPosition(int), getEndPosition()
  - setEndPosition
```
public void setEndPosition(int endPosition)
```
    Setting the end position of the token relative to the start of the input stream. For instance, the first character in a file has the start position 0. The character at the given end position is NOT part of this Token. This is the same principle as in the String.substring(int, int) method.
    This method is an alternative to setLength(int) depending on which information is at hand or easier to obtain for the Tokenizer producing this Token.
    Note that this method MUST be called after setStartPosition(int) since it affects the length of the token. Its effect is in turn eliminated by calls to setLength(int) and setImage(java.lang.String)
    
    Parameters:
    
    endPosition - the position where the token ends in the input stream.
  getEndPosition
  
  public int getEndPosition()
  
  Obtaining the end position of this token. Note that the return value of this method is only valid, if setStartPosition(int) has been called and one of the methods setImage(java.lang.String), setLength(int) or setEndPosition(int).
  
  Returns:
  
  end position of the token.
  
  See Also:
  
  setEndPosition(int), setStartPosition(int), getStartPosition()
  
  setStartLine
  
  public void setStartLine(int lineno)
  
  In Tokenizer's counting lines and columns, this method is used to set the line number where the beginning of the Token was found. Line numbers start with 0.
  
  Parameters:
  
  lineno - line number where the token begins
  
  See Also:
  
  getStartLine()
  
  getStartLine
  
  public int getStartLine()
  
  Obtaining the line number where the Token starts. See also setStartLine(int) for details.
  If a tokenizer doesn't count lines and columns, the returned value is -1.
  
  Returns:
  
  the line number where the token starts or -1, if no line counting is performed
  
  See Also:
  
  setStartLine(int)
  
  setStartColumn
  
  public void setStartColumn(int colno)
  
  In Tokenizer's counting lines and columns, this method is used to set the column number where the beginning of the Token was found. Column numbers start with 0.
  
  Parameters:
  
  colno - number where the token begins
  
  See Also:
  
  getStartColumn()
  
  getStartColumn
  
  public int getStartColumn()
  
  Obtaining the column number of the Token start. See setStartColumn(int) for details.
  If a tokenizer doesn't count lines and columns, the returned value is -1.
  
  Returns:
  
  the column number where the token starts or -1, if no line counting is performed
  
  See Also:
  
  setStartColumn(int)
  
  setEndLine
  
  public void setEndLine(int lineno)
  
  In Tokenizer's counting lines and columns, this method is used to set the line number where the end of the Token was found. See setStartLine(int) for more.
  The end line number is the one there the first character was found that does NOT belongs to the token. This approach is choosen in accordance to the toIndex parameters in String.substring(int, int).
  
  Parameters:
  
  lineno - line number where the token ends
  
  getEndLine
  
  public int getEndLine()
  
  Obtaining the line number where the token ends. See setEndLine(int) for more. If a tokenizer doesn't count lines and columns, the returned value is -1.
  
  Returns:
  
  line number where the token ends or -1, if no line counting is performed
  
  See Also:
  
  setEndLine(int)
  
  setEndColumn
  
  public void setEndColumn(int colno)
  
  In Tokenizer's counting lines and columns, this method is used to set the column number where the end of the Token was found.
  The end column number is the one of the first character that does NOT belongs to the token. This approach is choosen in accordance to the toIndex parameters in String.substring(int, int).
  
  Parameters:
  
  colno - column number where the token ends
  
  getEndColumn
  
  public int getEndColumn()
  
  Obtaining the column number where the Token ends. See setEndColumn(int) for more.
  If a tokenizer doesn't count lines and columns, the returned value is -1.
  
  Returns:
  
  column number where the token ends or -1, if no line counting is performed
  
  See Also:
  
  setEndColumn(int)
  
  equals
  
  public boolean equals(java.lang.Object object)
  
  Implementation of the well known method Object.equals(java.lang.Object). Note that two token are equal if every member of it is equal. That means that token retrieved by two different Tokenizer instances can be equal.
  
  Overrides:
  
  equals in class java.lang.Object
  
  Parameters:
  
  object - the Object to compare
  
  Returns:
  
  true if two token are equal, false otherwise
  
  toString
  
  public java.lang.String toString()
  
  Implementation of the well known method Object.toString().
  
  Overrides:
  
  toString in class java.lang.Object
  
  Returns:
  
  string representation of this object
  
  getTypeName
  
  public static java.lang.String getTypeName(int type)
  
  Getting a type name for displaying. The methode never fails even if the given type is unknown.
  
  Parameters:
  
  type - one of the Token type constants
  
  Returns:
  
  a string representation of the given type constant

Class Token

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

NORMAL

KEYWORD

STRING

PATTERN

SPECIAL_SEQUENCE

SEPARATOR

WHITESPACE

LINE_COMMENT

BLOCK_COMMENT

EOF

UNKNOWN

Constructor Detail

Token

Token

Token

Token

Method Detail

setType

getType

setImage

getImage

getImageParts

setImageParts

setLength

getLength

setCompanion

getCompanion

setStartPosition

getStartPosition

setEndPosition

getEndPosition

setStartLine

getStartLine

setStartColumn

getStartColumn

setEndLine

getEndLine

setEndColumn

getEndColumn

equals

toString

getTypeName