public abstract class AbstractTokenizer extends java.lang.Object implements Tokenizer, TokenizerPropertyListener
Base class for Tokenizer implementations. AbstractTokenizer
separates the data analysis from the actual data provision. Although the class
maintains read and write positions the physical representation of the logical
character buffer behind these positions concerns only the subclasses.
Tokenizer,
TokenizerProperties| Constructor and Description |
|---|
AbstractTokenizer()
Default constructor that sets the tokenizer control flags as it would be
approbriate for C/C++ and Java.
|
AbstractTokenizer(TokenizerProperties properties)
Contructing a
AbstractTokenizer with a backing TokenizerProperties
instance. |
| Modifier and Type | Method and Description |
|---|---|
void |
addTokenizer(AbstractTokenizer tokenizer)
Adding an embedded tokenizer.
|
void |
changeParseFlags(int flags,
int mask)
Setting the control flags of the
Tokenizer. |
void |
close()
Closing this tokenizer frees resources and deregisters from the
associated
TokenizerProperties object. |
java.lang.String |
currentImage()
Convenience method to retrieve only the token image of the
Token that
would be returned by currentToken(). |
int |
currentlyAvailable()
Retrieving the number of the currently available characters.
|
Token |
currentToken()
Retrieve the
Token that was found by the last call to nextToken(). |
char |
getChar(int pos)
Returns the character at the given position.
|
int |
getColumnNumber()
If the flag
TokenizerProperties#F_COUNT_LINES is set, this method will
return the current column positionstarting with 0 in the input stream. |
int |
getCurrentColumn()
Retrieve the current column.
|
int |
getCurrentLine()
Query the current row.
|
KeywordHandler |
getKeywordHandler()
Retrieving the current
KeywordHandler. |
int |
getLineNumber()
If the flag
TokenizerProperties#F_COUNT_LINES is set, this method will
return the line number starting with 0 in the input stream. |
int |
getParseFlags()
Retrieving the parser control flags.
|
PatternHandler |
getPatternHandler()
Retrieving the current
PatternHandler. |
int |
getReadPosition()
Getting the current read offset.
|
SeparatorHandler |
getSeparatorHandler()
Retrieving the current
SeparatorHandler. |
SequenceHandler |
getSequenceHandler()
Retrieving the current
SequenceHandler. |
TokenizerSource |
getSource()
Retrieving the
TokenizerSource of this Tokenizer. |
java.lang.String |
getText(int start,
int len)
Retrieve text from the currently available range.
|
TokenizerProperties |
getTokenizerProperties()
Retrieving the current tokenizer characteristics.
|
WhitespaceHandler |
getWhitespaceHandler()
Retrieving the current
WhitespaceHandler. |
boolean |
hasMoreToken()
Checking if there are more tokens available.
|
java.lang.String |
nextImage()
This method is a convenience method.
|
Token |
nextToken()
Retrieving the next
Token. |
void |
propertyChanged(TokenizerPropertyEvent event)
Event handler method.
|
int |
readMore()
Try to read more data into the text buffer of the tokenizer.
|
void |
setKeywordHandler(KeywordHandler handler)
Setting a new
KeywordHandler or removing any
previously installed one. |
void |
setPatternHandler(PatternHandler handler)
Setting a new
PatternHandler or removing any
previously installed one. |
void |
setReadPositionAbsolute(int position)
This method sets the tokenizers current read position to the given absolute
read position.
|
void |
setReadPositionRelative(int offset)
This method sets the tokenizers new read position the given number of characters
forward (positive value) or backward (negative value) starting from the current
read position.
|
void |
setSeparatorHandler(SeparatorHandler handler)
Setting a new
SeparatorHandler or removing any
previously installed SeparatorHandler. |
void |
setSequenceHandler(SequenceHandler handler)
Setting a new
SequenceHandler or removing any
previously installed one. |
void |
setSource(java.io.Reader reader)
Convenience method to avoid the construction of a
TokenizerSource
from the most important data source Reader. |
void |
setSource(TokenizerSource source)
Setting the source of data.
|
void |
setTokenizerProperties(TokenizerProperties props)
Setting the tokenizer characteristics.
|
void |
setWhitespaceHandler(WhitespaceHandler handler)
Setting a new
WhitespaceHandler or removing any
previously installed one. |
void |
switchTo(AbstractTokenizer tokenizer)
Changing fron one tokenizer to another.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetRangeStartpublic AbstractTokenizer()
TokenizerProperties.DEFAULT_WHITESPACES
and TokenizerProperties.DEFAULT_SEPARATORS for whitespace and
separator handling.public AbstractTokenizer(TokenizerProperties properties)
AbstractTokenizer with a backing TokenizerProperties
instance.properties - an TokenizerProperties object containing the
settings for the tokenizing processpublic void setSource(TokenizerSource source)
Tokenizer but may also be invoked while the tokenizing
is in progress. It will reset the tokenizers input buffer, line and column
counters etc.
setSource (equivalent to super calls in
constructors of derived classes).setSource in interface Tokenizersource - a TokenizerSource to read data fromgetSource()public void setSource(java.io.Reader reader)
TokenizerSource
from the most important data source Reader.reader - the Reader to get data frompublic TokenizerSource getSource()
TokenizerSource of this Tokenizer. The
method may return null if there is no TokenizerSource
associated with it.getSource in interface Tokenizerthe - TokenizerSource associated with this TokenizerTokenizerSource associated with this TokenizersetSource(de.susebox.jtopas.TokenizerSource)public void setTokenizerProperties(TokenizerProperties props) throws java.lang.NullPointerException
Tokenizer.setTokenizerProperties in interface Tokenizerprops - the TokenizerProperties for this tokenizerjava.lang.NullPointerException - if the null is passed to the callgetTokenizerProperties()public TokenizerProperties getTokenizerProperties()
Tokenizer.getTokenizerProperties in interface TokenizerTokenizerProperties of this TokenizersetTokenizerProperties(de.susebox.jtopas.TokenizerProperties)public void changeParseFlags(int flags,
int mask)
throws TokenizerException
Tokenizer. See the method
description in Tokenizer.changeParseFlags in interface Tokenizerflags - the parser control flagsmask - the mask for the flags to set or unsetTokenizerException - if one or more of the flags given cannot be honoredgetParseFlags()public int getParseFlags()
Tokenizer.getParseFlags in interface TokenizerchangeParseFlags(int, int)public void setKeywordHandler(KeywordHandler handler)
KeywordHandler or removing any
previously installed one. See the method description in Tokenizer.setKeywordHandler in interface Tokenizerhandler - the (new) KeywordHandler to use or null
to remove itTokenizer.getKeywordHandler(),
TokenizerProperties.addKeyword(java.lang.String)public KeywordHandler getKeywordHandler()
KeywordHandler. See the
method description in Tokenizer.getKeywordHandler in interface Tokenizernull, if
keyword support is switched offTokenizer.setKeywordHandler(de.susebox.jtopas.spi.KeywordHandler)public void setWhitespaceHandler(WhitespaceHandler handler)
WhitespaceHandler or removing any
previously installed one. See the method description in Tokenizer.setWhitespaceHandler in interface Tokenizerhandler - the (new) whitespace handler to use or null to
switch off whitespace handlinggetWhitespaceHandler()public WhitespaceHandler getWhitespaceHandler()
WhitespaceHandler. See
the method description in Tokenizer.getWhitespaceHandler in interface TokenizerTokenizer.setWhitespaceHandler(de.susebox.jtopas.spi.WhitespaceHandler)public void setSeparatorHandler(SeparatorHandler handler)
SeparatorHandler or removing any
previously installed SeparatorHandler. See the method description
in Tokenizer.setSeparatorHandler in interface Tokenizerhandler - the (new) separator handler to use or null to
remove itgetSeparatorHandler()public SeparatorHandler getSeparatorHandler()
SeparatorHandler. See
the method description in Tokenizer.getSeparatorHandler in interface TokenizerSeparatorHandler or null,
if separators aren't recognized by the tokenizersetSequenceHandler(de.susebox.jtopas.spi.SequenceHandler)public void setSequenceHandler(SequenceHandler handler)
SequenceHandler or removing any
previously installed one. See the method description in Tokenizer.setSequenceHandler in interface Tokenizerhandler - the (new) SequenceHandler to use or null to remove itTokenizer.getSequenceHandler(),
TokenizerProperties.addSpecialSequence(java.lang.String),
TokenizerProperties.addLineComment(java.lang.String),
TokenizerProperties.addBlockComment(java.lang.String, java.lang.String),
TokenizerProperties.addString(java.lang.String, java.lang.String, java.lang.String)public SequenceHandler getSequenceHandler()
SequenceHandler. See the method description
in Tokenizer.getSequenceHandler in interface TokenizerSequenceHandler or null, if the base
implementation is workingTokenizer.setSequenceHandler(de.susebox.jtopas.spi.SequenceHandler)public void setPatternHandler(PatternHandler handler)
PatternHandler or removing any
previously installed one. See the method description in Tokenizer.setPatternHandler in interface Tokenizerhandler - the (new) PatternHandler to
use or null to remove itgetPatternHandler()public PatternHandler getPatternHandler()
PatternHandler. See the
method description in Tokenizer.getPatternHandler in interface TokenizerPatternHandler
or null, if patterns are not recognized by the tokenizersetPatternHandler(de.susebox.jtopas.spi.PatternHandler)public int getCurrentLine()
TokenizerProperties#F_COUNT_LINES
has been set. Without this flag being set, the return value is undefined.
TokenizerProperties#F_COUNT_LINES is setpublic int getCurrentColumn()
F_COUNT_LINES
has been set.
Without this flag being set, the return value is undefined.
Note that column counting starts with 0, while editors often use 1 for the first
column in one row.public boolean hasMoreToken()
Tokenizer.hasMoreToken in interface Tokenizertrue if a ca_ll to nextToken() or nextImage()
will succed, false otherwisepublic Token nextToken() throws TokenizerException
nextToken in interface TokenizerToken including the EOF tokenTokenizerException - generic exception (list) for all problems that may occur while parsing
(IOExceptions for instance)Tokenizer.nextImage()public java.lang.String nextImage()
throws TokenizerException
Tokenizer.nextImage in interface TokenizerTokenizerException - generic exception (list) for all problems that may occur while parsing
(IOExceptions for instance)currentImage()public Token currentToken() throws TokenizerException
Token that was found by the last call to nextToken().
See the method description in Tokenizer.currentToken in interface TokenizerToken retrieved by the lahasest call to nextToken().TokenizerException - if the tokenizer has no current tokenTokenizer.nextToken(),
Tokenizer.currentImage()public java.lang.String currentImage()
throws TokenizerException
Token that
would be returned by currentToken(). See the method description in
Tokenizer.currentImage in interface TokenizerTokenizerException - if the tokenizer has no current tokencurrentToken()public int getLineNumber()
TokenizerProperties#F_COUNT_LINES is set, this method will
return the line number starting with 0 in the input stream. See the method
description in Tokenizer.getLineNumber in interface TokenizergetColumnNumber()public int getColumnNumber()
TokenizerProperties#F_COUNT_LINES is set, this method will
return the current column positionstarting with 0 in the input stream. See
the method description in Tokenizer.getColumnNumber in interface TokenizergetLineNumber()public int getReadPosition()
Tokenizer.getReadPosition in interface TokenizersetReadPositionAbsolute(int),
setReadPositionRelative(int)public int currentlyAvailable()
Tokenizer.currentlyAvailable in interface Tokenizerpublic int readMore()
throws TokenizerException
Tokenizer.readMore in interface TokenizerTokenizerException - generic exception (list) for all problems that
may occur while reading (IOExceptions for instance)public char getChar(int pos)
throws java.lang.IndexOutOfBoundsException
public java.lang.String getText(int start,
int len)
throws java.lang.IndexOutOfBoundsException
Tokenizer.getText in interface Tokenizerstart - position where the text beginslen - length of the textjava.lang.IndexOutOfBoundsException - if the starting position or the length
is out of the current text windowpublic void setReadPositionAbsolute(int position)
throws java.lang.IndexOutOfBoundsException
Tokenizer.
switchTo(de.susebox.jtopas.AbstractTokenizer). Until that point, a call to this
method has no effect on the other tokenizers sharing the same data source.setReadPositionAbsolute in interface Tokenizerposition - absolute position for the next parse operationjava.lang.IndexOutOfBoundsException - if the parameter position is
not in the available text range (text window)setReadPositionRelative(int)public void setReadPositionRelative(int offset)
throws java.lang.IndexOutOfBoundsException
Tokenizer.
switchTo(de.susebox.jtopas.AbstractTokenizer). Until that point, a call to this
method has no effect on the other tokenizers sharing the same data source.setReadPositionRelative in interface Tokenizeroffset - number of characters to move forward (positive offset) or
backward (negative offset)java.lang.IndexOutOfBoundsException - if the parameter offset would
move the read position out of the available text range (text window)setReadPositionAbsolute(int)public void close()
TokenizerProperties object.public void addTokenizer(AbstractTokenizer tokenizer) throws TokenizerException
switchTo(de.susebox.jtopas.AbstractTokenizer).
TokenizerProperties#F_KEEP_DATA and TokenizerProperties#F_COUNT_LINES
flags of the base tokenizer take effect also in the embedded tokenizers.
tokenizer is a
derivation of the AbstractTokenizer class, this method is
synchronized on tokenizer.tokenizer - an embedded tokenizerTokenizerException - if something goes wrong (not likely :-)public void switchTo(AbstractTokenizer tokenizer) throws TokenizerException
addTokenizer(de.susebox.jtopas.AbstractTokenizer), an exception is thrown.switchTo method does the nessecary synchronisation between
this and the given tokenizer. The user is therefore responsible
to use switchTo whenever a tokenizer change is nessecary. It
must be done this way:
Tokenizer base = new MyTokenizer(...)
Tokenizer embedded = new MyTokenizer(...)
// setting properties (comments, keywords etc.)
...
// embedding a tokenizer
base.addTokenizer(embedded);
// tokenizing with base
...
if (switch_condition) {
base.switchTo(embedded);
}
// tokenizing with embedded
...
if (switch_condition) {
embedded.switchTo(base);
}
That way we avoid a more complex synchronisation between tokenizers whenever
one of them parses the next data in the input stream. However, the danger
of not synchronized tokenizers remains, so take care.
tokenizer is a
derivation of the AbstractTokenizer class, this method is
synchronized on tokenizer.tokenizer - the tokenizer that should be used from now onTokenizerExceptionpublic void propertyChanged(TokenizerPropertyEvent event)
TokenizerPropertyEvent parameter
contains the nessecary information about the property change. We choose
one single method in favour of various more specialized methods since the
reactions on adding, removing and modifying tokenizer properties are often
the same (flushing cash, rereading information etc.) are probably not very
different.
TokenizerProperties
object removes all flags previously modified through changeParseFlags(int, int).propertyChanged in interface TokenizerPropertyListenerevent - the TokenizerPropertyEvent that describes the change