AbstractTokenizer

java.lang.Object
- de.susebox.jtopas.AbstractTokenizer

All Implemented Interfaces:

Tokenizer, TokenizerPropertyListener

Direct Known Subclasses:

StandardTokenizer
```
public abstract class AbstractTokenizer
extends java.lang.Object
implements Tokenizer, TokenizerPropertyListener
```
Base class for Tokenizer implementations. AbstractTokenizer separates the data analysis from the actual data provision. Although the class maintains read and write positions the physical representation of the logical character buffer behind these positions concerns only the subclasses.

Author:

Heiko Blau

See Also:

Tokenizer, TokenizerProperties

Constructor Summary

Constructors
Constructor and Description
`AbstractTokenizer()` Default constructor that sets the tokenizer control flags as it would be approbriate for C/C++ and Java.
`AbstractTokenizer(TokenizerProperties properties)` Contructing a `AbstractTokenizer` with a backing `TokenizerProperties` instance.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addTokenizer(AbstractTokenizer tokenizer)` Adding an embedded tokenizer.
`void`	`changeParseFlags(int flags, int mask)` Setting the control flags of the `Tokenizer`.
`void`	`close()` Closing this tokenizer frees resources and deregisters from the associated `TokenizerProperties` object.
`java.lang.String`	`currentImage()` Convenience method to retrieve only the token image of the `Token` that would be returned by `currentToken()`.
`int`	`currentlyAvailable()` Retrieving the number of the currently available characters.
`Token`	`currentToken()` Retrieve the `Token` that was found by the last call to `nextToken()`.
`char`	`getChar(int pos)` Returns the character at the given position.
`int`	`getColumnNumber()` If the flag `TokenizerProperties#F_COUNT_LINES` is set, this method will return the current column positionstarting with 0 in the input stream.
`int`	`getCurrentColumn()` Retrieve the current column.
`int`	`getCurrentLine()` Query the current row.
`KeywordHandler`	`getKeywordHandler()` Retrieving the current `KeywordHandler`.
`int`	`getLineNumber()` If the flag `TokenizerProperties#F_COUNT_LINES` is set, this method will return the line number starting with 0 in the input stream.
`int`	`getParseFlags()` Retrieving the parser control flags.
`PatternHandler`	`getPatternHandler()` Retrieving the current `PatternHandler`.
`int`	`getReadPosition()` Getting the current read offset.
`SeparatorHandler`	`getSeparatorHandler()` Retrieving the current `SeparatorHandler`.
`SequenceHandler`	`getSequenceHandler()` Retrieving the current `SequenceHandler`.
`TokenizerSource`	`getSource()` Retrieving the `TokenizerSource` of this `Tokenizer`.
`java.lang.String`	`getText(int start, int len)` Retrieve text from the currently available range.
`TokenizerProperties`	`getTokenizerProperties()` Retrieving the current tokenizer characteristics.
`WhitespaceHandler`	`getWhitespaceHandler()` Retrieving the current `WhitespaceHandler`.
`boolean`	`hasMoreToken()` Checking if there are more tokens available.
`java.lang.String`	`nextImage()` This method is a convenience method.
`Token`	`nextToken()` Retrieving the next `Token`.
`void`	`propertyChanged(TokenizerPropertyEvent event)` Event handler method.
`int`	`readMore()` Try to read more data into the text buffer of the tokenizer.
`void`	`setKeywordHandler(KeywordHandler handler)` Setting a new `KeywordHandler` or removing any previously installed one.
`void`	`setPatternHandler(PatternHandler handler)` Setting a new `PatternHandler` or removing any previously installed one.
`void`	`setReadPositionAbsolute(int position)` This method sets the tokenizers current read position to the given absolute read position.
`void`	`setReadPositionRelative(int offset)` This method sets the tokenizers new read position the given number of characters forward (positive value) or backward (negative value) starting from the current read position.
`void`	`setSeparatorHandler(SeparatorHandler handler)` Setting a new `SeparatorHandler` or removing any previously installed `SeparatorHandler`.
`void`	`setSequenceHandler(SequenceHandler handler)` Setting a new `SequenceHandler` or removing any previously installed one.
`void`	`setSource(java.io.Reader reader)` Convenience method to avoid the construction of a `TokenizerSource` from the most important data source `Reader`.
`void`	`setSource(TokenizerSource source)` Setting the source of data.
`void`	`setTokenizerProperties(TokenizerProperties props)` Setting the tokenizer characteristics.
`void`	`setWhitespaceHandler(WhitespaceHandler handler)` Setting a new `WhitespaceHandler` or removing any previously installed one.
`void`	`switchTo(AbstractTokenizer tokenizer)` Changing fron one tokenizer to another.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface de.susebox.jtopas.Tokenizer
getRangeStart

- Constructor Detail
  - AbstractTokenizer
```
public AbstractTokenizer()
```
    Default constructor that sets the tokenizer control flags as it would be approbriate for C/C++ and Java. Found token images are copied. No line nor column informations are provided. Nested comments are not allowed.
    The tokenizer will use the TokenizerProperties.DEFAULT_WHITESPACES and TokenizerProperties.DEFAULT_SEPARATORS for whitespace and separator handling.
  - AbstractTokenizer
```
public AbstractTokenizer(TokenizerProperties properties)
```
    Contructing a AbstractTokenizer with a backing TokenizerProperties instance.
    
    Parameters:
    
    properties - an TokenizerProperties object containing the settings for the tokenizing process
- Method Detail
  - setSource
```
public void setSource(TokenizerSource source)
```
    Setting the source of data. This method is usually called during setup of the Tokenizer but may also be invoked while the tokenizing is in progress. It will reset the tokenizers input buffer, line and column counters etc.
    Subclasses should override this method to do their own actions on a data source change. Generally, this base method should be called first in the subclass implementation of setSource (equivalent to super calls in constructors of derived classes).
    
    Specified by:
    
    setSource in interface Tokenizer
    
    Parameters:
    
    source - a TokenizerSource to read data from
    
    See Also:
    
    getSource()
  - setSource
```
public void setSource(java.io.Reader reader)
```
    Convenience method to avoid the construction of a TokenizerSource from the most important data source Reader.
    
    Parameters:
    
    reader - the Reader to get data from
  - getSource
```
public TokenizerSource getSource()
```
    Retrieving the TokenizerSource of this Tokenizer. The method may return null if there is no TokenizerSource associated with it.
    
    Specified by:
    
    getSource in interface Tokenizer
    
    Parameters:
    
    the - TokenizerSource associated with this Tokenizer
    
    Returns:
    
    the TokenizerSource associated with this Tokenizer
    
    See Also:
    
    setSource(de.susebox.jtopas.TokenizerSource)
  - setTokenizerProperties
```
public void setTokenizerProperties(TokenizerProperties props)
                            throws java.lang.NullPointerException
```
    Setting the tokenizer characteristics. See the method description in Tokenizer.
    
    Specified by:
    
    setTokenizerProperties in interface Tokenizer
    
    Parameters:
    
    props - the TokenizerProperties for this tokenizer
    
    Throws:
    
    java.lang.NullPointerException - if the null is passed to the call
    
    See Also:
    
    getTokenizerProperties()
  - getTokenizerProperties
```
public TokenizerProperties getTokenizerProperties()
```
    Retrieving the current tokenizer characteristics. See the method description in Tokenizer.
    
    Specified by:
    
    getTokenizerProperties in interface Tokenizer
    
    Returns:
    
    the TokenizerProperties of this Tokenizer
    
    See Also:
    
    setTokenizerProperties(de.susebox.jtopas.TokenizerProperties)
  - changeParseFlags
```
public void changeParseFlags(int flags,
                             int mask)
                      throws TokenizerException
```
    Setting the control flags of the Tokenizer. See the method description in Tokenizer.
    
    Specified by:
    
    changeParseFlags in interface Tokenizer
    
    Parameters:
    
    flags - the parser control flags
    
    mask - the mask for the flags to set or unset
    
    Throws:
    
    TokenizerException - if one or more of the flags given cannot be honored
    
    See Also:
    
    getParseFlags()
  - getParseFlags
```
public int getParseFlags()
```
    Retrieving the parser control flags. See the method description in Tokenizer.
    
    Specified by:
    
    getParseFlags in interface Tokenizer
    
    Returns:
    
    the current parser control flags
    
    See Also:
    
    changeParseFlags(int, int)
  - setKeywordHandler
```
public void setKeywordHandler(KeywordHandler handler)
```
    Setting a new KeywordHandler or removing any previously installed one. See the method description in Tokenizer.
    
    Specified by:
    
    setKeywordHandler in interface Tokenizer
    
    Parameters:
    
    handler - the (new) KeywordHandler to use or null to remove it
    
    See Also:
    
    Tokenizer.getKeywordHandler(), TokenizerProperties.addKeyword(java.lang.String)
  - getKeywordHandler
```
public KeywordHandler getKeywordHandler()
```
    Retrieving the current KeywordHandler. See the method description in Tokenizer.
    
    Specified by:
    
    getKeywordHandler in interface Tokenizer
    
    Returns:
    
    the currently active whitespace keyword or null, if keyword support is switched off
    
    See Also:
    
    Tokenizer.setKeywordHandler(de.susebox.jtopas.spi.KeywordHandler)
  - setWhitespaceHandler
```
public void setWhitespaceHandler(WhitespaceHandler handler)
```
    Setting a new WhitespaceHandler or removing any previously installed one. See the method description in Tokenizer.
    
    Specified by:
    
    setWhitespaceHandler in interface Tokenizer
    
    Parameters:
    
    handler - the (new) whitespace handler to use or null to switch off whitespace handling
    
    See Also:
    
    getWhitespaceHandler()
  - getWhitespaceHandler
```
public WhitespaceHandler getWhitespaceHandler()
```
    Retrieving the current WhitespaceHandler. See the method description in Tokenizer.
    
    Specified by:
    
    getWhitespaceHandler in interface Tokenizer
    
    Returns:
    
    the currently active whitespace handler or null, if the base implementation is working
    
    See Also:
    
    Tokenizer.setWhitespaceHandler(de.susebox.jtopas.spi.WhitespaceHandler)
  - setSeparatorHandler
```
public void setSeparatorHandler(SeparatorHandler handler)
```
    Setting a new SeparatorHandler or removing any previously installed SeparatorHandler. See the method description in Tokenizer.
    
    Specified by:
    
    setSeparatorHandler in interface Tokenizer
    
    Parameters:
    
    handler - the (new) separator handler to use or null to remove it
    
    See Also:
    
    getSeparatorHandler()
  - getSeparatorHandler
```
public SeparatorHandler getSeparatorHandler()
```
    Retrieving the current SeparatorHandler. See the method description in Tokenizer.
    
    Specified by:
    
    getSeparatorHandler in interface Tokenizer
    
    Returns:
    
    the currently active SeparatorHandler or null, if separators aren't recognized by the tokenizer
    
    See Also:
    
    setSequenceHandler(de.susebox.jtopas.spi.SequenceHandler)
  - setSequenceHandler
```
public void setSequenceHandler(SequenceHandler handler)
```
    Setting a new SequenceHandler or removing any previously installed one. See the method description in Tokenizer.
    
    Specified by:
    
    setSequenceHandler in interface Tokenizer
    
    Parameters:
    
    handler - the (new) SequenceHandler to use or null to remove it
    
    See Also:
    
    Tokenizer.getSequenceHandler(), TokenizerProperties.addSpecialSequence(java.lang.String), TokenizerProperties.addLineComment(java.lang.String), TokenizerProperties.addBlockComment(java.lang.String, java.lang.String), TokenizerProperties.addString(java.lang.String, java.lang.String, java.lang.String)
  - getSequenceHandler
```
public SequenceHandler getSequenceHandler()
```
    Retrieving the current SequenceHandler. See the method description in Tokenizer.
    
    Specified by:
    
    getSequenceHandler in interface Tokenizer
    
    Returns:
    
    the currently active SequenceHandler or null, if the base implementation is working
    
    See Also:
    
    Tokenizer.setSequenceHandler(de.susebox.jtopas.spi.SequenceHandler)
  - setPatternHandler
```
public void setPatternHandler(PatternHandler handler)
```
    Setting a new PatternHandler or removing any previously installed one. See the method description in Tokenizer.
    
    Specified by:
    
    setPatternHandler in interface Tokenizer
    
    Parameters:
    
    handler - the (new) PatternHandler to use or null to remove it
    
    See Also:
    
    getPatternHandler()
  - getPatternHandler
```
public PatternHandler getPatternHandler()
```
    Retrieving the current PatternHandler. See the method description in Tokenizer.
    
    Specified by:
    
    getPatternHandler in interface Tokenizer
    
    Returns:
    
    the currently active PatternHandler or null, if patterns are not recognized by the tokenizer
    
    See Also:
    
    setPatternHandler(de.susebox.jtopas.spi.PatternHandler)
  - getCurrentLine
```
public int getCurrentLine()
```
    Query the current row. The method can only be used if the flag TokenizerProperties#F_COUNT_LINES has been set. Without this flag being set, the return value is undefined.
    Note that row counting starts with 0, while editors often use 1 for the first row.
    
    Returns:
    
    current row (starting with 0) or -1 if the flag TokenizerProperties#F_COUNT_LINES is set
  - getCurrentColumn
```
public int getCurrentColumn()
```
    Retrieve the current column. The method can only be used if the flag F_COUNT_LINES has been set. Without this flag being set, the return value is undefined. Note that column counting starts with 0, while editors often use 1 for the first column in one row.
    
    Returns:
    
    current column number (starting with 0)
  - hasMoreToken
```
public boolean hasMoreToken()
```
    Checking if there are more tokens available. See the method description in Tokenizer.
    
    Specified by:
    
    hasMoreToken in interface Tokenizer
    
    Returns:
    
    true if a ca_ll to nextToken() or nextImage() will succed, false otherwise
  - nextToken
```
public Token nextToken()
                throws TokenizerException
```
    Retrieving the next Token. See the method description in Tokenizer.
    
    Specified by:
    
    nextToken in interface Tokenizer
    
    Returns:
    
    found Token including the EOF token
    
    Throws:
    
    TokenizerException - generic exception (list) for all problems that may occur while parsing (IOExceptions for instance)
    
    See Also:
    
    Tokenizer.nextImage()
  - nextImage
```
public java.lang.String nextImage()
                           throws TokenizerException
```
    This method is a convenience method. It returns only the next token image without any informations about its type or associated information. See the method description in Tokenizer.
    
    Specified by:
    
    nextImage in interface Tokenizer
    
    Returns:
    
    the token image of the next token
    
    Throws:
    
    TokenizerException - generic exception (list) for all problems that may occur while parsing (IOExceptions for instance)
    
    See Also:
    
    currentImage()
  - currentToken
```
public Token currentToken()
                   throws TokenizerException
```
    Retrieve the Token that was found by the last call to nextToken(). See the method description in Tokenizer.
    
    Specified by:
    
    currentToken in interface Tokenizer
    
    Returns:
    
    the Token retrieved by the lahasest call to nextToken().
    
    Throws:
    
    TokenizerException - if the tokenizer has no current token
    
    See Also:
    
    Tokenizer.nextToken(), Tokenizer.currentImage()
  - currentImage
```
public java.lang.String currentImage()
                              throws TokenizerException
```
    Convenience method to retrieve only the token image of the Token that would be returned by currentToken(). See the method description in Tokenizer.
    
    Specified by:
    
    currentImage in interface Tokenizer
    
    Returns:
    
    the token image of the current token
    
    Throws:
    
    TokenizerException - if the tokenizer has no current token
    
    See Also:
    
    currentToken()
  - getLineNumber
```
public int getLineNumber()
```
    If the flag TokenizerProperties#F_COUNT_LINES is set, this method will return the line number starting with 0 in the input stream. See the method description in Tokenizer.
    
    Specified by:
    
    getLineNumber in interface Tokenizer
    
    Returns:
    
    the current line number starting with 0 or -1 if no line numbers are supplied.
    
    See Also:
    
    getColumnNumber()
  - getColumnNumber
```
public int getColumnNumber()
```
    If the flag TokenizerProperties#F_COUNT_LINES is set, this method will return the current column positionstarting with 0 in the input stream. See the method description in Tokenizer.
    
    Specified by:
    
    getColumnNumber in interface Tokenizer
    
    Returns:
    
    the current column position
    
    See Also:
    
    getLineNumber()
  - getReadPosition
```
public int getReadPosition()
```
    Getting the current read offset. See the method description in Tokenizer.
    
    Specified by:
    
    getReadPosition in interface Tokenizer
    
    Returns:
    
    the absolute offset in characters from the start of the data source of the Tokenizer where reading will be continued
    
    See Also:
    
    setReadPositionAbsolute(int), setReadPositionRelative(int)
  - currentlyAvailable
```
public int currentlyAvailable()
```
    Retrieving the number of the currently available characters. See the method description in Tokenizer.
    
    Specified by:
    
    currentlyAvailable in interface Tokenizer
    
    Returns:
    
    number of currently available characters
  - readMore
```
public int readMore()
             throws TokenizerException
```
    Try to read more data into the text buffer of the tokenizer. See the method description in Tokenizer.
    
    Specified by:
    
    readMore in interface Tokenizer
    
    Returns:
    
    the number of character now available
    
    Throws:
    
    TokenizerException - generic exception (list) for all problems that may occur while reading (IOExceptions for instance)
  - getChar
```
public char getChar(int pos)
             throws java.lang.IndexOutOfBoundsException
```
    Returns the character at the given position. The method does not attempt to read more data.
    
    Specified by:
    
    getChar in interface Tokenizer
    
    Parameters:
    
    pos - get character on this position in the data stream
    
    Returns:
    
    the character at the given position
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the parameter pos is not in the available text range (text window)
  - getText
```
public java.lang.String getText(int start,
                                int len)
                         throws java.lang.IndexOutOfBoundsException
```
    Retrieve text from the currently available range. See the method description in Tokenizer.
    
    Specified by:
    
    getText in interface Tokenizer
    
    Parameters:
    
    start - position where the text begins
    
    len - length of the text
    
    Returns:
    
    the text beginning at the given position ith the given length
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the starting position or the length is out of the current text window
  - setReadPositionAbsolute
```
public void setReadPositionAbsolute(int position)
                             throws java.lang.IndexOutOfBoundsException
```
    This method sets the tokenizers current read position to the given absolute read position. See the method description in Tokenizer.
    When using this method with embedded tokenizers, the user is responsible to set the read position in the currently used tokenizer. It will be propagated by the next call to switchTo(de.susebox.jtopas.AbstractTokenizer). Until that point, a call to this method has no effect on the other tokenizers sharing the same data source.
    
    Specified by:
    
    setReadPositionAbsolute in interface Tokenizer
    
    Parameters:
    
    position - absolute position for the next parse operation
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the parameter position is not in the available text range (text window)
    
    See Also:
    
    setReadPositionRelative(int)
  - setReadPositionRelative
```
public void setReadPositionRelative(int offset)
                             throws java.lang.IndexOutOfBoundsException
```
    This method sets the tokenizers new read position the given number of characters forward (positive value) or backward (negative value) starting from the current read position. See the method description in Tokenizer.
    When using this method with embedded tokenizers, the user is responsible to set the read position in the currently used tokenizer. It will be propagated by the next call to switchTo(de.susebox.jtopas.AbstractTokenizer). Until that point, a call to this method has no effect on the other tokenizers sharing the same data source.
    
    Specified by:
    
    setReadPositionRelative in interface Tokenizer
    
    Parameters:
    
    offset - number of characters to move forward (positive offset) or backward (negative offset)
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the parameter offset would move the read position out of the available text range (text window)
    
    See Also:
    
    setReadPositionAbsolute(int)
  - close
```
public void close()
```
    Closing this tokenizer frees resources and deregisters from the associated TokenizerProperties object.
    
    Specified by:
    
    close in interface Tokenizer
  - addTokenizer
```
public void addTokenizer(AbstractTokenizer tokenizer)
                  throws TokenizerException
```
    Adding an embedded tokenizer. Embedded tokenizer work on the same input buffer as their base tokenizer. A situation where embedded tokenizer could be applied, is a HTML stream with cascading style sheet (CSS) and JavaScript parts.
    There are no internal means of switching from one tokenizer to another. This should be done by the caller using the method switchTo(de.susebox.jtopas.AbstractTokenizer).
    The TokenizerProperties#F_KEEP_DATA and TokenizerProperties#F_COUNT_LINES flags of the base tokenizer take effect also in the embedded tokenizers.
    Since is might be possible that the given tokenizer is a derivation of the AbstractTokenizer class, this method is synchronized on tokenizer.
    
    Parameters:
    
    tokenizer - an embedded tokenizer
    
    Throws:
    
    TokenizerException - if something goes wrong (not likely :-)
  - switchTo
```
public void switchTo(AbstractTokenizer tokenizer)
              throws TokenizerException
```
    Changing fron one tokenizer to another. If the given tokenizer has not been added with addTokenizer(de.susebox.jtopas.AbstractTokenizer), an exception is thrown.
    The switchTo method does the nessecary synchronisation between this and the given tokenizer. The user is therefore responsible to use switchTo whenever a tokenizer change is nessecary. It must be done this way:
    Tokenizer base = new MyTokenizer(...) Tokenizer embedded = new MyTokenizer(...) // setting properties (comments, keywords etc.) ... // embedding a tokenizer base.addTokenizer(embedded); // tokenizing with base ... if (switch_condition) { base.switchTo(embedded); } // tokenizing with embedded ... if (switch_condition) { embedded.switchTo(base); }
    That way we avoid a more complex synchronisation between tokenizers whenever one of them parses the next data in the input stream. However, the danger of not synchronized tokenizers remains, so take care.
    Since is might be possible that the given tokenizer is a derivation of the AbstractTokenizer class, this method is synchronized on tokenizer.
    Parameters:
    
    tokenizer - the tokenizer that should be used from now on
    
    Throws:
    
    TokenizerException
  - propertyChanged
```
public void propertyChanged(TokenizerPropertyEvent event)
```
    Event handler method. The given TokenizerPropertyEvent parameter contains the nessecary information about the property change. We choose one single method in favour of various more specialized methods since the reactions on adding, removing and modifying tokenizer properties are often the same (flushing cash, rereading information etc.) are probably not very different.
    Note that a modification of the parse flags in the backing TokenizerProperties object removes all flags previously modified through changeParseFlags(int, int).
    
    Specified by:
    
    propertyChanged in interface TokenizerPropertyListener
    
    Parameters:
    
    event - the TokenizerPropertyEvent that describes the change

Class AbstractTokenizer

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface de.susebox.jtopas.Tokenizer

Constructor Detail

AbstractTokenizer

AbstractTokenizer

Method Detail

setSource

setSource

getSource

setTokenizerProperties

getTokenizerProperties

changeParseFlags

getParseFlags

setKeywordHandler

getKeywordHandler

setWhitespaceHandler

getWhitespaceHandler

setSeparatorHandler

getSeparatorHandler

setSequenceHandler

getSequenceHandler

setPatternHandler

getPatternHandler

getCurrentLine

getCurrentColumn

hasMoreToken

nextToken

nextImage

currentToken

currentImage

getLineNumber

getColumnNumber

getReadPosition

currentlyAvailable

readMore

getChar

getText

setReadPositionAbsolute

setReadPositionRelative

close

addTokenizer

switchTo

propertyChanged