Tokenizer

All Known Implementing Classes:

AbstractTokenizer, StandardTokenizer
```
public interface Tokenizer
```
The interface Tokenizer contains setup methods, parse operations and other getter and setter methods for a tokenizer. A tokenizer splits a stream of input data into various units like whitespaces, comments, keywords etc. These units are the tokens that are reflected in the Token class of the de.susebox.jtopas package.
A Tokenizer is configured using a TokenizerProperties object that contains declarations for whitespaces, separators, comments, keywords, special sequences and patterns. It is designed to enable a common approach for parsing texts like program code, annotated documents like HTML and so on.
To detect links in an HTML document, a tokenizer would be invoked like that (see StandardTokenizerProperties and StandardTokenizer for the classes mentioned here):
```
 Vector               links     = new Vector();
 FileReader           reader    = new FileReader("index.html");
 TokenizerProperties  props     = new StandardTokenizerProperties();
 Tokenizer            tokenizer = new StandardTokenizer();
 Token                token;

 props.setParseFlags(Tokenizer.F_NO_CASE);
 props.setSeparators("=");
 props.addString("\"", "\"", "\\");
 props.addBlockComment(">", "<");
 props.addKeyword("HREF");

 tokenizer.setTokenizerProperties(props);
 tokenizer.setSource(new ReaderSource(reader));

 try {
   while (tokenizer.hasMoreToken()) {
     token = tokenizer.nextToken();
     if (token.getType() == Token.KEYWORD) {
       tokenizer.nextToken();               // should be the '=' character
       links.addElement(tokenizer.next());
     }
   }
 } finally {
   tokenizer.close();
   reader.close();
 }
```
This somewhat rough way to find links should work fine on syntactically correct HTML code. It finds common links as well as mail, ftp links etc. Note the block comment. It starts with the ">" character, that is the closing character for HTML tags and ends with the "<" being the starting character of HTML tags. The effect is that all the real text is treated as a comment.
To extract the contents of a HTML file, one would write:
```
 StringBuffer         contents  = new StringBuffer(4096);
 FileReader           reader    = new FileReader("index.html");
 TokenizerProperties  props     = new StandardTokenizerProperties();
 Tokenizer            tokenizer = new StandardTokenizer();
 Token                token;

 props.setParseFlags(Tokenizer.F_NO_CASE);
 props.addBlockComment(">", "<");
 props.addBlockComment(">HEAD<", ">/HEAD<");
 props.addBlockComment(">!--;", "--<");
    
 tokenizer.setTokenizerProperties(props);
 tokenizer.setSource(new ReaderSource(reader));

 try {
   while (tokenizer.hasMoreToken()) {
     token = tokenizer.nextToken();
     if (token.getType() != Token.BLOCK_COMMENT) {
       contents.append(token.getToken());
     }
   }
 } finally {
   tokenizer.close();
   reader.close();
 }
```
Here the block comment is the exact opposite of the first example. Now all the HTML tags are skipped. Moreover, we declared the HTML-Header as a block comment as well - the informations from the header are thus skipped alltogether.
Parsing (tokenizing) is done on a well defined priority scheme. See nextToken() for details.
NOTE: if a character sequence is registered for two categories of tokenizer properties (e.g. as a line comments starting sequence as well as a special sequence), the category with the highest priority wins (e.g. if the metioned sequence is found, it is interpreted as a line comment).
The tokenizer interface is clearly designed for "readable" data, say ASCII- or UNICODE data. Parsing binary data has other characteristics that do not necessarily fit in a scheme of comments, keywords, strings, identifiers and operators.
Note that the interface has no methods that handle stream data sources. This is left to the implementations that may have quite different data sources, e. g. InputStreamReader, database queries, string arrays etc. The interface TokenizerSource serves as an abstraction of such widely varying data sources.
The Tokenizer interface partly replaces the older de.susebox.java.util.Tokenizer interface which is deprecated.
Author:

Heiko Blau

See Also:

Token, TokenizerProperties

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`void`	`changeParseFlags(int flags, int mask)` Setting the control flags of the `TokenizerProperties`.
`void`	`close()` This method is nessecary to release memory and remove object references if a `Tokenizer` instances are frequently created for small tasks.
`java.lang.String`	`currentImage()` Convenience method to retrieve only the token image of the `Token` that would be returned by `currentToken()`.
`int`	`currentlyAvailable()` Retrieving the number of the currently available characters.
`Token`	`currentToken()` Retrieve the `Token` that was found by the last call to `nextToken()`.
`char`	`getChar(int pos)` Get a single character from the current text range.
`int`	`getColumnNumber()` If the flag `TokenizerProperties#F_COUNT_LINES` is set, this method will return the current column position starting with 0 in the input stream.
`KeywordHandler`	`getKeywordHandler()` Retrieving the current `KeywordHandler`.
`int`	`getLineNumber()` If the flag `TokenizerProperties#F_COUNT_LINES` is set, this method will return the line number starting with 0 in the input stream.
`int`	`getParseFlags()` Retrieving the parser control flags.
`PatternHandler`	`getPatternHandler()` Retrieving the current `PatternHandler`.
`int`	`getRangeStart()` This method returns the absolute offset in characters to the start of the parsed stream.
`int`	`getReadPosition()` Getting the current read offset.
`SeparatorHandler`	`getSeparatorHandler()` Retrieving the current `SeparatorHandler`.
`SequenceHandler`	`getSequenceHandler()` Retrieving the current `SequenceHandler`.
`TokenizerSource`	`getSource()` Retrieving the `TokenizerSource` of this `Tokenizer`.
`java.lang.String`	`getText(int start, int length)` Retrieve text from the currently available range.
`TokenizerProperties`	`getTokenizerProperties()` Retrieving the current tokenizer characteristics.
`WhitespaceHandler`	`getWhitespaceHandler()` Retrieving the current `WhitespaceHandler`.
`boolean`	`hasMoreToken()` Check if there are more tokens available.
`java.lang.String`	`nextImage()` This method is a convenience method.
`Token`	`nextToken()` Retrieving the next `Token`.
`int`	`readMore()` Try to read more data into the text buffer of the tokenizer.
`void`	`setKeywordHandler(KeywordHandler handler)` Setting a new `KeywordHandler` or removing any previously installed one.
`void`	`setPatternHandler(PatternHandler handler)` Setting a new `PatternHandler` or removing any previously installed one.
`void`	`setReadPositionAbsolute(int position)` This method sets the tokenizers current read position to the given absolute read position.
`void`	`setReadPositionRelative(int offset)` This method sets the tokenizers new read position the given number of characters forward (positive value) or backward (negative value) starting from the current read position.
`void`	`setSeparatorHandler(SeparatorHandler handler)` Setting a new `SeparatorHandler` or removing any previously installed `SeparatorHandler`.
`void`	`setSequenceHandler(SequenceHandler handler)` Setting a new `SequenceHandler` or removing any previously installed one.
`void`	`setSource(TokenizerSource source)` Setting the source of data.
`void`	`setTokenizerProperties(TokenizerProperties props)` Setting the tokenizer characteristics.
`void`	`setWhitespaceHandler(WhitespaceHandler handler)` Setting a new `WhitespaceHandler` or removing any previously installed one.

- Method Detail
  - setSource
```
void setSource(TokenizerSource source)
```
    Setting the source of data. This method is usually called during setup of the Tokenizer but may also be invoked while the tokenizing is in progress. It will reset the tokenizers input buffer, line and column counters etc.
    It is allowed to pass null. Calls to hasMoreToken() will return false, while calling nextToken() will return an EOF token.
    
    Parameters:
    
    source - a TokenizerSource to read data from
    
    See Also:
    
    getSource()
  - getSource
```
TokenizerSource getSource()
```
    Retrieving the TokenizerSource of this Tokenizer. The method may return null if there is no TokenizerSource associated with this Tokenizer.
    
    Returns:
    
    the TokenizerSource associated with this Tokenizer
    
    See Also:
    
    setSource(de.susebox.jtopas.TokenizerSource)
  - setTokenizerProperties
```
void setTokenizerProperties(TokenizerProperties props)
                     throws java.lang.NullPointerException,
                            java.lang.IllegalArgumentException
```
    Setting the tokenizer characteristics. This operation is usually done before the parse process. A common place is a constructor of a Tokenizer implementation. If the tokenizer characteristics change during the parse process they take effect with the next call of nextToken() or nextImage(). Usually, a Tokenizer implementation will also implement the TokenizerPropertyListener interface to be notified about property changes.
    Generally, the Tokenizer implementation should also implement the DataProvider interface or provide an inner class that implements the DataProvider interface, while the TokenizerProperties implementation should in turn implement the interfaces
    These handler interfaces are collected in the DataMapper interface.
    Although the implementation of the mentioned interfaces is recommended, it is not a mandatory way. Except for PatternHandler that must be implemented by the TokenizerProperties implementation, since it is not possible for a Tokenizer to interpret a regular expression pattern only with the information provided through the TokenizerProperties interface.
    If a Tokenizer implementation chooses to use a exclusively tailored TokenizerProperties implementation, it should throw an IllegalArgumentException if it is not provided with an instance of that TokenizerProperties implementation.
    If null is passed to the method it throws NullPointerException.
    Parameters:
    
    props - the TokenizerProperties for this tokenizer
    
    Throws:
    
    java.lang.NullPointerException - if the null is passed to the call
    
    java.lang.IllegalArgumentException - if the TokenizerProperties implementation of the parameter cannot be used with the implementation of this Tokenizer
    
    See Also:
    
    getTokenizerProperties()
  - getTokenizerProperties
```
TokenizerProperties getTokenizerProperties()
```
    Retrieving the current tokenizer characteristics. The method may return null if setTokenizerProperties(de.susebox.jtopas.TokenizerProperties) has not been called so far.
    
    Returns:
    
    the TokenizerProperties of this Tokenizer
    
    See Also:
    
    setTokenizerProperties(de.susebox.jtopas.TokenizerProperties)
  - changeParseFlags
```
void changeParseFlags(int flags,
                      int mask)
               throws TokenizerException
```
    Setting the control flags of the TokenizerProperties. Use a combination of the F_... flags declared in TokenizerProperties for the parameter. The mask parameter contains a bit mask of the F_... flags to change.
    The parse flags for a tokenizer can be set through the associated TokenizerProperties instance. These global settings take effect in all Tokenizer instances that use the same TokenizerProperties object. Flags related to the parsing process can also be set separately for each tokenizer during runtime. These are the dynamic flags:
    - TokenizerProperties#F_RETURN_WHITESPACES and its sub-flags
    - TokenizerProperties#F_TOKEN_POS_ONLY
    Other flags can also be set for each tokenizer separately, but should be set before the tokenizing starts to make sense.
    - TokenizerProperties#F_KEEP_DATA
    - TokenizerProperties#F_COUNT_LINES
    The other flags should only be used on the TokenizerProperties instance or on single TokenizerProperty objects and influence all Tokenizer instances sharing the same TokenizerProperties object. For instance, using the flag TokenizerProperties#F_NO_CASE is an invalid operation on a Tokenizer. It affects the interpretation of keywords and sequences by the associated TokenizerProperties instance and, moreover, possibly the storage of these properties.
    This method throws a TokenizerException if a flag is passed that cannot be handled by the Tokenizer object itself.
    This method takes precedence over the TokenizerProperties.setParseFlags(int) method of the associated TokenizerProperties object. Even if the global settings of one of the dynamic flags (see above) change after a call to this method, the flags set separately for this tokenizer, stay active.
    Parameters:
    
    flags - the parser control flags
    
    mask - the mask for the flags to set or unset
    
    Throws:
    
    TokenizerException - if one or more of the flags given cannot be honored
    
    See Also:
    
    getParseFlags()
  - getParseFlags
```
int getParseFlags()
```
    Retrieving the parser control flags. A bitmask containing the F_... constants is returned. This method returns both the flags that are set separately for this Tokenizer and the flags set for the associated TokenizerProperties object.
    
    Returns:
    
    the current parser control flags
    
    See Also:
    
    changeParseFlags(int, int)
  - setKeywordHandler
```
void setKeywordHandler(KeywordHandler handler)
```
    Setting a new KeywordHandler or removing any previously installed one. If null is passed (installed handler removed), no keyword support is available.
    Usually, the TokenizerProperties used by a Tokenizer implement the KeywordHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its KeywordHandler. A different or a handler specific to a certain Tokenizer instance, can be set using this method.
    
    Parameters:
    
    handler - the (new) KeywordHandler to use or null to remove it
    
    See Also:
    
    getKeywordHandler(), TokenizerProperties.addKeyword(java.lang.String)
  - getKeywordHandler
```
KeywordHandler getKeywordHandler()
```
    Retrieving the current KeywordHandler. The method may return null if there isn't any handler installed.
    
    Returns:
    
    the currently active KeywordHandler or null, if keyword support is switched off
    
    See Also:
    
    setKeywordHandler(de.susebox.jtopas.spi.KeywordHandler)
  - setWhitespaceHandler
```
void setWhitespaceHandler(WhitespaceHandler handler)
```
    Setting a new WhitespaceHandler or removing any previously installed one. If null is passed, the tokenizer will not recognize whitespaces.
    Usually, the TokenizerProperties used by a Tokenizer implement the WhitespaceHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its WhitespaceHandler. A different handler or a handler specific to a certain Tokenizer instance, can be set using this method.
    
    Parameters:
    
    handler - the (new) whitespace handler to use or null to switch off whitespace handling
    
    See Also:
    
    getWhitespaceHandler(), TokenizerProperties.setWhitespaces(java.lang.String)
  - getWhitespaceHandler
```
WhitespaceHandler getWhitespaceHandler()
```
    Retrieving the current WhitespaceHandler. The method may return null if there whitespaces are not recognized.
    
    Returns:
    
    the currently active whitespace handler or null, if the base implementation is working
    
    See Also:
    
    setWhitespaceHandler(de.susebox.jtopas.spi.WhitespaceHandler)
  - setSeparatorHandler
```
void setSeparatorHandler(SeparatorHandler handler)
```
    Setting a new SeparatorHandler or removing any previously installed SeparatorHandler. If null is passed, the tokenizer doesn't recognize separators.
    Usually, the TokenizerProperties used by a Tokenizer implement the SeparatorHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its SeparatorHandler. A different handler or a handler specific to a certain Tokenizer instance, can be set using this method.
    
    Parameters:
    
    handler - the (new) separator handler to use or null to remove it
    
    See Also:
    
    getSeparatorHandler(), TokenizerProperties.setSeparators(java.lang.String)
  - getSeparatorHandler
```
SeparatorHandler getSeparatorHandler()
```
    Retrieving the current SeparatorHandler. The method may return null if there isn't any handler installed.
    
    Returns:
    
    the currently active SeparatorHandler or null, if separators aren't recognized by the tokenizer
    
    See Also:
    
    setSeparatorHandler(de.susebox.jtopas.spi.SeparatorHandler)
  - setSequenceHandler
```
void setSequenceHandler(SequenceHandler handler)
```
    Setting a new SequenceHandler or removing any previously installed one. If null is passed, the tokenizer will not recognize line and block comments, strings and special sequences.
    Usually, the TokenizerProperties used by a Tokenizer implement the SequenceHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its SeparatorHandler. A different handler or a handler specific to a certain Tokenizer instance, can be set using this method.
    
    Parameters:
    
    handler - the (new) SequenceHandler to use or null to remove it
    
    See Also:
    
    getSequenceHandler(), TokenizerProperties.addSpecialSequence(java.lang.String), TokenizerProperties.addLineComment(java.lang.String), TokenizerProperties.addBlockComment(java.lang.String, java.lang.String), TokenizerProperties.addString(java.lang.String, java.lang.String, java.lang.String)
  - getSequenceHandler
```
SequenceHandler getSequenceHandler()
```
    Retrieving the current SequenceHandler. The method may return null if there isn't any handler installed.
    A SequenceHandler deals with line and block comments, strings and special sequences.
    
    Returns:
    
    the currently active SequenceHandler or null, if no
    
    See Also:
    
    setSequenceHandler(de.susebox.jtopas.spi.SequenceHandler)
  - setPatternHandler
```
void setPatternHandler(PatternHandler handler)
```
    Setting a new PatternHandler or removing any previously installed one. If null is passed, pattern are not supported by the tokenizer (any longer).
    Usually, the TokenizerProperties used by a Tokenizer implement the PatternHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its PatternHandler. A different handler or a handler specific to a certain Tokenizer instance, can be set using this method.
    
    Parameters:
    
    handler - the (new) PatternHandler to use or null to remove it
    
    See Also:
    
    getPatternHandler(), TokenizerProperties.addPattern(java.lang.String)
  - getPatternHandler
```
PatternHandler getPatternHandler()
```
    Retrieving the current PatternHandler. The method may return null if there isn't any handler installed.
    
    Returns:
    
    the currently active PatternHandler or null, if patterns are not recognized by the tokenizer
    
    See Also:
    
    setPatternHandler(de.susebox.jtopas.spi.PatternHandler)
  - hasMoreToken
```
boolean hasMoreToken()
```
    Check if there are more tokens available. This method will return true until and enf-of-file condition is encountered during a call to nextToken() or nextImage().
    That means, that the EOF is returned one time, afterwards hasMoreToken will return false. Furthermore, that implies, that the method will return true at least once, even if the input data stream is empty.
    The method can be conveniently used in a while loop.
    
    Returns:
    
    true if a call to nextToken() or nextImage() will succed, false otherwise
  - nextToken
```
Token nextToken()
         throws TokenizerException
```
    Retrieving the next Token. The method works in this order:
    1. Check for an end-of-file condition. If there is such a condition then return it.
    2. Try to collect a sequence of whitespaces. If such a sequence can be found return if the flag F_RETURN_WHITESPACES is set, or skip these whitespaces.
    3. Check the next characters against all known pattern. A pattern is usually a regular expression that is used by Pattern. But implementations of PatternHandler may use other pattern syntaxes. Note that pattern are not recognized within "normal" text (see below for a more precise description).
    4. Check the next characters against all known line and block comments. If a line or block comment starting sequence matches, return if the flag F_RETURN_WHITESPACES is set, or skip the comment. If comments are returned they include their starting and ending sequences (newline in case of a line comment).
    5. Check the next characters against all known string starting sequences. If a string begin could be identified return the string until and including the closing sequence.
    6. Check the next characters against all known special sequences. Especially, find the longest possible match. If a special sequence could be identified then return it.
    7. Check for ordinary separators. If one could be found return it.
    8. Check the next characters against all known keywords. If a keyword could be identified then return it.
    9. Return the text portion until the next whitespace, comment, special sequence or separator. Note that pattern are not recognized within "normal" text. A pattern match has therefore always a whitespace, comment, special sequence, separator or another pattern match in front of it or starts at position 0 of the data.
    The method will return the EOF token as long as hasMoreToken() returns false. It will not return null in such conditions.
    Returns:
    
    found Token including the EOF token
    
    Throws:
    
    TokenizerException - generic exception (list) for all problems that may occur while parsing (IOExceptions for instance)
    
    See Also:
    
    nextImage()
  - nextImage
```
java.lang.String nextImage()
                    throws TokenizerException
```
    This method is a convenience method. It returns only the next token image without any informations about its type or associated information. This is an especially usefull method, if the parse flags for this Tokenizer have the flag TokenizerProperties#F_TOKEN_POS_ONLY set, since this method returns a valid string even in that case.
    
    Returns:
    
    the token image of the next token
    
    Throws:
    
    TokenizerException - generic exception (list) for all problems that may occur while parsing (IOExceptions for instance)
    
    See Also:
    
    nextToken(), currentImage()
  - currentToken
```
Token currentToken()
            throws TokenizerException
```
    Retrieve the Token that was found by the last call to nextToken(). or nextImage().
    Since version 0.6.1 of JTopas, this method throws a TokenizerException rather than returning null if neither nextToken() nor nextImage() have been called before or setReadPositionRelative(int) or setReadPositionAbsolute(int) habe been called after the last call to nextToken or nextImage.
    
    Returns:
    
    the Token retrieved by the last call to nextToken().
    
    Throws:
    
    TokenizerException - if the tokenizer has no current token
    
    See Also:
    
    nextToken(), currentImage()
  - currentImage
```
java.lang.String currentImage()
                       throws TokenizerException
```
    Convenience method to retrieve only the token image of the Token that would be returned by currentToken(). This is an especially usefull method, if the parse flags for this Tokenizer have the flag TokenizerProperties#F_TOKEN_POS_ONLY set, since this method returns a valid string even in that case.
    Since version 0.6.1 of JTopas, this method throws a TokenizerException rather than returning null if neither nextToken() nor nextImage() have been called before or setReadPositionRelative(int) or setReadPositionAbsolute(int) habe been called after the last call to nextToken or nextImage.
    
    Returns:
    
    the token image of the current token
    
    Throws:
    
    TokenizerException - if the tokenizer has no current token
    
    See Also:
    
    currentToken(), nextImage()
  - getLineNumber
```
int getLineNumber()
```
    If the flag TokenizerProperties#F_COUNT_LINES is set, this method will return the line number starting with 0 in the input stream. The implementation of the Tokenizer interface can decide which end-of-line sequences should be recognized. The most flexible approach is to process the following end-of-line sequences:
    - Carriage Return (ASCII 13, '\r'). This EOL is used on Apple Macintosh
    - Linefeed (ASCII 10, '\n'). This is the UNIX EOL character.
    - Carriage Return + Linefeed ("\r\n"). This is used on MS Windows systems.
    Another legitime and in many cases satisfying way is to use the system property "line.separator".
    Displaying information about lines usually means adding 1 to the zero-based line number.
    Returns:
    
    the current line number starting with 0 or -1 if no line numbers are supplied (TokenizerProperties#F_COUNT_LINES is not set).
    
    See Also:
    
    getColumnNumber()
  - getColumnNumber
```
int getColumnNumber()
```
    If the flag TokenizerProperties#F_COUNT_LINES is set, this method will return the current column position starting with 0 in the input stream. Displaying information about columns usually means adding 1 to the zero-based column number.
    
    Returns:
    
    the current column position or -1 if the flag if no column numbers are supplied TokenizerProperties#F_COUNT_LINES is not set). is not set
    
    See Also:
    
    getLineNumber()
  - getRangeStart
```
int getRangeStart()
```
    This method returns the absolute offset in characters to the start of the parsed stream. Together with currentlyAvailable() it describes the currently available text "window".
    The position returned by this method and also by getReadPosition() are absolute rather than relative in a text buffer to give the tokenizer the full control of how and when to refill its text buffer.
    
    Returns:
    
    the absolute offset of the current text window in characters from the start of the data source of the Tokenizer
  - getReadPosition
```
int getReadPosition()
```
    Getting the current read offset. This is the absolute position where the next call to nextToken or next will start. It is therefore not the same as the position returned by Token.getStartPosition() of the current token (currentToken()).
    It is the starting position of the token returned by the next call to nextToken(), if that token is no whitespace or if whitespaces are returned (TokenizerProperties#F_RETURN_WHITESPACES).
    The position returned by this method and also by getRangeStart() are absolute rather than relative in a text buffer to give the tokenizer the full control of how and when to refill its text buffer.
    
    Returns:
    
    the absolute offset in characters from the start of the data source of the Tokenizer where reading will be continued
  - currentlyAvailable
```
int currentlyAvailable()
```
    Retrieving the number of the currently available characters. This includes both characters already parsed by the Tokenizer and characters still to be analyzed.
    
    Returns:
    
    number of currently available characters
  - getText
```
java.lang.String getText(int start,
                         int length)
                  throws java.lang.IndexOutOfBoundsException
```
    Retrieve text from the currently available range. The start and length parameters must be inside getRangeStart() and getRangeStart() + currentlyAvailable().
    Example:
    int startPos = tokenizer.getReadPosition(); String source; while (tokenizer.hasMoreToken()) { Token token = tokenizer.nextToken(); switch (token.getType()) { case Token.LINE_COMMENT: case Token.BLOCK_COMMENT: source = tokenizer.getText(startPos, token.getStartPos() - startPos); startPos = token.getStartPos(); } }
    Parameters:
    
    start - position where the text begins
    
    length - length of the text
    
    Returns:
    
    the text beginning at the given position ith the given length
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the starting position or the length is out of the current text window
  - getChar
```
char getChar(int pos)
      throws java.lang.IndexOutOfBoundsException
```
    Get a single character from the current text range.
    
    Parameters:
    
    pos - position of the required character
    
    Returns:
    
    the character at the specified position
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the parameter pos is not in the available text range (text window)
  - readMore
```
int readMore()
      throws TokenizerException
```
    Try to read more data into the text buffer of the tokenizer. This can be useful when a method needs to look ahead of the available data or a skip operation should be performed.
    The method returns the same value than an immediately following call to currentlyAvailable() would return.
    
    Returns:
    
    the number of character now available
    
    Throws:
    
    TokenizerException - generic exception (list) for all problems that may occur while reading (IOExceptions for instance)
  - setReadPositionAbsolute
```
void setReadPositionAbsolute(int position)
                      throws java.lang.IndexOutOfBoundsException
```
    This method sets the tokenizers current read position to the given absolute read position. It realizes one type of rewind / forward operations. The given position must be inside the intervall getRangeStart() and getRangeStart() + currentlyAvailable() - 1.
    The current read position is the end position of the current token. That means that the following assertion can be made:
```
    Token token1 = tokenizer.nextToken();
    tokenizer.setReadPositionAbsolute(tokenizer.getReadPosition() - token1.getLength());
    Token token2 = tokenizer.nextToken();
    assert(token1.equals(token2));
```
    Since JTopas version 0.6.1, the operation clears the current token. Therefore, currentImage() and currentToken() will throw a TokenizerException if called after a setReadPositionAbsolute without a subsequent call to nextToken() of nextImage().
    Parameters:
    
    position - absolute position for the next parse operation
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the parameter position is not in the available text range (text window)
    
    See Also:
    
    setReadPositionRelative(int)
  - setReadPositionRelative
```
void setReadPositionRelative(int offset)
                      throws java.lang.IndexOutOfBoundsException
```
    This method sets the tokenizers new read position the given number of characters forward (positive value) or backward (negative value) starting from the current read position. It realizes one type of rewind / forward operations. The given offset must be greater or equal than getRangeStart() - getReadPosition() and lower than currentlyAvailable() - getReadPosition().
    Since JTopas version 0.6.1, the operation clears the current token. Therefore, currentImage() and currentToken() will throw a TokenizerException if called after a setReadPositionAbsolute without a subsequent call to nextToken() of nextImage().
    
    Parameters:
    
    offset - number of characters to move forward (positive offset) or backward (negative offset)
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the parameter offset would move the read position out of the available text range (text window)
    
    See Also:
    
    setReadPositionAbsolute(int)
  - close
```
void close()
```
    This method is nessecary to release memory and remove object references if a Tokenizer instances are frequently created for small tasks. Generally, the method shouldn't throw any exceptions. It is also ok to call it more than once.
    It is an error, to call any other method of the implementing class after close has been called.

Interface Tokenizer

Method Summary

Method Detail

setSource

getSource

setTokenizerProperties

getTokenizerProperties

changeParseFlags

getParseFlags

setKeywordHandler

getKeywordHandler

setWhitespaceHandler

getWhitespaceHandler

setSeparatorHandler

getSeparatorHandler

setSequenceHandler

getSequenceHandler

setPatternHandler

getPatternHandler

hasMoreToken

nextToken

nextImage

currentToken

currentImage

getLineNumber

getColumnNumber

getRangeStart

getReadPosition

currentlyAvailable

getText

getChar

readMore

setReadPositionAbsolute

setReadPositionRelative

close