public abstract class AbstractTokenizer extends java.lang.Object implements Tokenizer, TokenizerPropertyListener
Base class for Tokenizer
implementations. AbstractTokenizer
separates the data analysis from the actual data provision. Although the class
maintains read and write positions the physical representation of the logical
character buffer behind these positions concerns only the subclasses.
Tokenizer
,
TokenizerProperties
Constructor and Description |
---|
AbstractTokenizer()
Default constructor that sets the tokenizer control flags as it would be
approbriate for C/C++ and Java.
|
AbstractTokenizer(TokenizerProperties properties)
Contructing a
AbstractTokenizer with a backing TokenizerProperties
instance. |
Modifier and Type | Method and Description |
---|---|
void |
addTokenizer(AbstractTokenizer tokenizer)
Adding an embedded tokenizer.
|
void |
changeParseFlags(int flags,
int mask)
Setting the control flags of the
Tokenizer . |
void |
close()
Closing this tokenizer frees resources and deregisters from the
associated
TokenizerProperties object. |
java.lang.String |
currentImage()
Convenience method to retrieve only the token image of the
Token that
would be returned by currentToken() . |
int |
currentlyAvailable()
Retrieving the number of the currently available characters.
|
Token |
currentToken()
Retrieve the
Token that was found by the last call to nextToken() . |
char |
getChar(int pos)
Returns the character at the given position.
|
int |
getColumnNumber()
If the flag
TokenizerProperties#F_COUNT_LINES is set, this method will
return the current column positionstarting with 0 in the input stream. |
int |
getCurrentColumn()
Retrieve the current column.
|
int |
getCurrentLine()
Query the current row.
|
KeywordHandler |
getKeywordHandler()
Retrieving the current
KeywordHandler . |
int |
getLineNumber()
If the flag
TokenizerProperties#F_COUNT_LINES is set, this method will
return the line number starting with 0 in the input stream. |
int |
getParseFlags()
Retrieving the parser control flags.
|
PatternHandler |
getPatternHandler()
Retrieving the current
PatternHandler . |
int |
getReadPosition()
Getting the current read offset.
|
SeparatorHandler |
getSeparatorHandler()
Retrieving the current
SeparatorHandler . |
SequenceHandler |
getSequenceHandler()
Retrieving the current
SequenceHandler . |
TokenizerSource |
getSource()
Retrieving the
TokenizerSource of this Tokenizer . |
java.lang.String |
getText(int start,
int len)
Retrieve text from the currently available range.
|
TokenizerProperties |
getTokenizerProperties()
Retrieving the current tokenizer characteristics.
|
WhitespaceHandler |
getWhitespaceHandler()
Retrieving the current
WhitespaceHandler . |
boolean |
hasMoreToken()
Checking if there are more tokens available.
|
java.lang.String |
nextImage()
This method is a convenience method.
|
Token |
nextToken()
Retrieving the next
Token . |
void |
propertyChanged(TokenizerPropertyEvent event)
Event handler method.
|
int |
readMore()
Try to read more data into the text buffer of the tokenizer.
|
void |
setKeywordHandler(KeywordHandler handler)
Setting a new
KeywordHandler or removing any
previously installed one. |
void |
setPatternHandler(PatternHandler handler)
Setting a new
PatternHandler or removing any
previously installed one. |
void |
setReadPositionAbsolute(int position)
This method sets the tokenizers current read position to the given absolute
read position.
|
void |
setReadPositionRelative(int offset)
This method sets the tokenizers new read position the given number of characters
forward (positive value) or backward (negative value) starting from the current
read position.
|
void |
setSeparatorHandler(SeparatorHandler handler)
Setting a new
SeparatorHandler or removing any
previously installed SeparatorHandler . |
void |
setSequenceHandler(SequenceHandler handler)
Setting a new
SequenceHandler or removing any
previously installed one. |
void |
setSource(java.io.Reader reader)
Convenience method to avoid the construction of a
TokenizerSource
from the most important data source Reader . |
void |
setSource(TokenizerSource source)
Setting the source of data.
|
void |
setTokenizerProperties(TokenizerProperties props)
Setting the tokenizer characteristics.
|
void |
setWhitespaceHandler(WhitespaceHandler handler)
Setting a new
WhitespaceHandler or removing any
previously installed one. |
void |
switchTo(AbstractTokenizer tokenizer)
Changing fron one tokenizer to another.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getRangeStart
public AbstractTokenizer()
TokenizerProperties.DEFAULT_WHITESPACES
and TokenizerProperties.DEFAULT_SEPARATORS
for whitespace and
separator handling.public AbstractTokenizer(TokenizerProperties properties)
AbstractTokenizer
with a backing TokenizerProperties
instance.properties
- an TokenizerProperties
object containing the
settings for the tokenizing processpublic void setSource(TokenizerSource source)
Tokenizer
but may also be invoked while the tokenizing
is in progress. It will reset the tokenizers input buffer, line and column
counters etc.
setSource
(equivalent to super calls in
constructors of derived classes).setSource
in interface Tokenizer
source
- a TokenizerSource
to read data fromgetSource()
public void setSource(java.io.Reader reader)
TokenizerSource
from the most important data source Reader
.reader
- the Reader
to get data frompublic TokenizerSource getSource()
TokenizerSource
of this Tokenizer
. The
method may return null
if there is no TokenizerSource
associated with it.getSource
in interface Tokenizer
the
- TokenizerSource
associated with this Tokenizer
TokenizerSource
associated with this Tokenizer
setSource(de.susebox.jtopas.TokenizerSource)
public void setTokenizerProperties(TokenizerProperties props) throws java.lang.NullPointerException
Tokenizer
.setTokenizerProperties
in interface Tokenizer
props
- the TokenizerProperties
for this tokenizerjava.lang.NullPointerException
- if the null
is passed to the callgetTokenizerProperties()
public TokenizerProperties getTokenizerProperties()
Tokenizer
.getTokenizerProperties
in interface Tokenizer
TokenizerProperties
of this Tokenizer
setTokenizerProperties(de.susebox.jtopas.TokenizerProperties)
public void changeParseFlags(int flags, int mask) throws TokenizerException
Tokenizer
. See the method
description in Tokenizer
.changeParseFlags
in interface Tokenizer
flags
- the parser control flagsmask
- the mask for the flags to set or unsetTokenizerException
- if one or more of the flags given cannot be honoredgetParseFlags()
public int getParseFlags()
Tokenizer
.getParseFlags
in interface Tokenizer
changeParseFlags(int, int)
public void setKeywordHandler(KeywordHandler handler)
KeywordHandler
or removing any
previously installed one. See the method description in Tokenizer
.setKeywordHandler
in interface Tokenizer
handler
- the (new) KeywordHandler
to use or null
to remove itTokenizer.getKeywordHandler()
,
TokenizerProperties.addKeyword(java.lang.String)
public KeywordHandler getKeywordHandler()
KeywordHandler
. See the
method description in Tokenizer
.getKeywordHandler
in interface Tokenizer
null
, if
keyword support is switched offTokenizer.setKeywordHandler(de.susebox.jtopas.spi.KeywordHandler)
public void setWhitespaceHandler(WhitespaceHandler handler)
WhitespaceHandler
or removing any
previously installed one. See the method description in Tokenizer
.setWhitespaceHandler
in interface Tokenizer
handler
- the (new) whitespace handler to use or null
to
switch off whitespace handlinggetWhitespaceHandler()
public WhitespaceHandler getWhitespaceHandler()
WhitespaceHandler
. See
the method description in Tokenizer
.getWhitespaceHandler
in interface Tokenizer
Tokenizer.setWhitespaceHandler(de.susebox.jtopas.spi.WhitespaceHandler)
public void setSeparatorHandler(SeparatorHandler handler)
SeparatorHandler
or removing any
previously installed SeparatorHandler
. See the method description
in Tokenizer
.setSeparatorHandler
in interface Tokenizer
handler
- the (new) separator handler to use or null
to
remove itgetSeparatorHandler()
public SeparatorHandler getSeparatorHandler()
SeparatorHandler
. See
the method description in Tokenizer
.getSeparatorHandler
in interface Tokenizer
SeparatorHandler
or null
,
if separators aren't recognized by the tokenizersetSequenceHandler(de.susebox.jtopas.spi.SequenceHandler)
public void setSequenceHandler(SequenceHandler handler)
SequenceHandler
or removing any
previously installed one. See the method description in Tokenizer
.setSequenceHandler
in interface Tokenizer
handler
- the (new) SequenceHandler
to use or null to remove itTokenizer.getSequenceHandler()
,
TokenizerProperties.addSpecialSequence(java.lang.String)
,
TokenizerProperties.addLineComment(java.lang.String)
,
TokenizerProperties.addBlockComment(java.lang.String, java.lang.String)
,
TokenizerProperties.addString(java.lang.String, java.lang.String, java.lang.String)
public SequenceHandler getSequenceHandler()
SequenceHandler
. See the method description
in Tokenizer
.getSequenceHandler
in interface Tokenizer
SequenceHandler
or null, if the base
implementation is workingTokenizer.setSequenceHandler(de.susebox.jtopas.spi.SequenceHandler)
public void setPatternHandler(PatternHandler handler)
PatternHandler
or removing any
previously installed one. See the method description in Tokenizer
.setPatternHandler
in interface Tokenizer
handler
- the (new) PatternHandler
to
use or null
to remove itgetPatternHandler()
public PatternHandler getPatternHandler()
PatternHandler
. See the
method description in Tokenizer
.getPatternHandler
in interface Tokenizer
PatternHandler
or null
, if patterns are not recognized by the tokenizersetPatternHandler(de.susebox.jtopas.spi.PatternHandler)
public int getCurrentLine()
TokenizerProperties#F_COUNT_LINES
has been set. Without this flag being set, the return value is undefined.
TokenizerProperties#F_COUNT_LINES
is setpublic int getCurrentColumn()
F_COUNT_LINES
has been set.
Without this flag being set, the return value is undefined.
Note that column counting starts with 0, while editors often use 1 for the first
column in one row.public boolean hasMoreToken()
Tokenizer
.hasMoreToken
in interface Tokenizer
true
if a ca_ll to nextToken()
or nextImage()
will succed, false
otherwisepublic Token nextToken() throws TokenizerException
nextToken
in interface Tokenizer
Token
including the EOF tokenTokenizerException
- generic exception (list) for all problems that may occur while parsing
(IOExceptions for instance)Tokenizer.nextImage()
public java.lang.String nextImage() throws TokenizerException
Tokenizer
.nextImage
in interface Tokenizer
TokenizerException
- generic exception (list) for all problems that may occur while parsing
(IOExceptions for instance)currentImage()
public Token currentToken() throws TokenizerException
Token
that was found by the last call to nextToken()
.
See the method description in Tokenizer
.currentToken
in interface Tokenizer
Token
retrieved by the lahasest call to nextToken()
.TokenizerException
- if the tokenizer has no current tokenTokenizer.nextToken()
,
Tokenizer.currentImage()
public java.lang.String currentImage() throws TokenizerException
Token
that
would be returned by currentToken()
. See the method description in
Tokenizer
.currentImage
in interface Tokenizer
TokenizerException
- if the tokenizer has no current tokencurrentToken()
public int getLineNumber()
TokenizerProperties#F_COUNT_LINES
is set, this method will
return the line number starting with 0 in the input stream. See the method
description in Tokenizer
.getLineNumber
in interface Tokenizer
getColumnNumber()
public int getColumnNumber()
TokenizerProperties#F_COUNT_LINES
is set, this method will
return the current column positionstarting with 0 in the input stream. See
the method description in Tokenizer
.getColumnNumber
in interface Tokenizer
getLineNumber()
public int getReadPosition()
Tokenizer
.getReadPosition
in interface Tokenizer
setReadPositionAbsolute(int)
,
setReadPositionRelative(int)
public int currentlyAvailable()
Tokenizer
.currentlyAvailable
in interface Tokenizer
public int readMore() throws TokenizerException
Tokenizer
.readMore
in interface Tokenizer
TokenizerException
- generic exception (list) for all problems that
may occur while reading (IOExceptions for instance)public char getChar(int pos) throws java.lang.IndexOutOfBoundsException
public java.lang.String getText(int start, int len) throws java.lang.IndexOutOfBoundsException
Tokenizer
.getText
in interface Tokenizer
start
- position where the text beginslen
- length of the textjava.lang.IndexOutOfBoundsException
- if the starting position or the length
is out of the current text windowpublic void setReadPositionAbsolute(int position) throws java.lang.IndexOutOfBoundsException
Tokenizer
.
switchTo(de.susebox.jtopas.AbstractTokenizer)
. Until that point, a call to this
method has no effect on the other tokenizers sharing the same data source.setReadPositionAbsolute
in interface Tokenizer
position
- absolute position for the next parse operationjava.lang.IndexOutOfBoundsException
- if the parameter position
is
not in the available text range (text window)setReadPositionRelative(int)
public void setReadPositionRelative(int offset) throws java.lang.IndexOutOfBoundsException
Tokenizer
.
switchTo(de.susebox.jtopas.AbstractTokenizer)
. Until that point, a call to this
method has no effect on the other tokenizers sharing the same data source.setReadPositionRelative
in interface Tokenizer
offset
- number of characters to move forward (positive offset) or
backward (negative offset)java.lang.IndexOutOfBoundsException
- if the parameter offset
would
move the read position out of the available text range (text window)setReadPositionAbsolute(int)
public void close()
TokenizerProperties
object.public void addTokenizer(AbstractTokenizer tokenizer) throws TokenizerException
switchTo(de.susebox.jtopas.AbstractTokenizer)
.
TokenizerProperties#F_KEEP_DATA
and TokenizerProperties#F_COUNT_LINES
flags of the base tokenizer take effect also in the embedded tokenizers.
tokenizer
is a
derivation of the AbstractTokenizer
class, this method is
synchronized on tokenizer
.tokenizer
- an embedded tokenizerTokenizerException
- if something goes wrong (not likely :-)public void switchTo(AbstractTokenizer tokenizer) throws TokenizerException
addTokenizer(de.susebox.jtopas.AbstractTokenizer)
, an exception is thrown.switchTo
method does the nessecary synchronisation between
this
and the given tokenizer. The user is therefore responsible
to use switchTo
whenever a tokenizer change is nessecary. It
must be done this way:
That way we avoid a more complex synchronisation between tokenizers whenever one of them parses the next data in the input stream. However, the danger of not synchronized tokenizers remains, so take care.Tokenizer base = new MyTokenizer(...) Tokenizer embedded = new MyTokenizer(...) // setting properties (comments, keywords etc.) ... // embedding a tokenizer base.addTokenizer(embedded); // tokenizing with base ... if (switch_condition) { base.switchTo(embedded); } // tokenizing with embedded ... if (switch_condition) { embedded.switchTo(base); }
tokenizer
is a
derivation of the AbstractTokenizer
class, this method is
synchronized on tokenizer
.tokenizer
- the tokenizer that should be used from now onTokenizerException
public void propertyChanged(TokenizerPropertyEvent event)
TokenizerPropertyEvent
parameter
contains the nessecary information about the property change. We choose
one single method in favour of various more specialized methods since the
reactions on adding, removing and modifying tokenizer properties are often
the same (flushing cash, rereading information etc.) are probably not very
different.
TokenizerProperties
object removes all flags previously modified through changeParseFlags(int, int)
.propertyChanged
in interface TokenizerPropertyListener
event
- the TokenizerPropertyEvent
that describes the change