public class StandardTokenizer extends AbstractTokenizer implements Tokenizer, TokenizerPropertyListener
This is the mainstream Tokenizer
. It implements the Tokenizer
interface in a straightforward approach without too specialized parse
optimizations.
Beside the Tokenizer
interface, the class StandardTokenizer
provides some basic features for cascading (nested) tokenizers. Consider the usual
HTML pages found today in the WWW. Most of them are a mixture of regular HTML,
cascading style sheets (CSS) and embedded JavaScript. These different languages
use different syntaxes, so one needs varous tokenizers on the same input stream.
This Tokenizer
implementation is not synchronized. Take care when using
with multible threads.
Tokenizer
,
TokenizerProperties
Constructor and Description |
---|
StandardTokenizer()
Default constructor that sets the tokenizer control flags as it would be
approbriate for C/C++ and Java.
|
StandardTokenizer(TokenizerProperties properties)
Contructing a
StandardTokenizer with a backing TokenizerProperties
instance. |
Modifier and Type | Method and Description |
---|---|
void |
close()
Closing this tokenizer frees resources.
|
int |
getRangeStart()
This method returns the absolute offset in characters to the start of the
parsed stream.
|
void |
setSource(TokenizerSource source)
Additionally to the common behaviour implemented in
#de.susebox.jtopas.AbstractTokenizer#setSource , this method ajusts
the state speicific to the StandardTokenizer class. |
addTokenizer, changeParseFlags, currentImage, currentlyAvailable, currentToken, getChar, getColumnNumber, getCurrentColumn, getCurrentLine, getKeywordHandler, getLineNumber, getParseFlags, getPatternHandler, getReadPosition, getSeparatorHandler, getSequenceHandler, getSource, getText, getTokenizerProperties, getWhitespaceHandler, hasMoreToken, nextImage, nextToken, propertyChanged, readMore, setKeywordHandler, setPatternHandler, setReadPositionAbsolute, setReadPositionRelative, setSeparatorHandler, setSequenceHandler, setSource, setTokenizerProperties, setWhitespaceHandler, switchTo
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
changeParseFlags, currentImage, currentlyAvailable, currentToken, getChar, getColumnNumber, getKeywordHandler, getLineNumber, getParseFlags, getPatternHandler, getReadPosition, getSeparatorHandler, getSequenceHandler, getSource, getText, getTokenizerProperties, getWhitespaceHandler, hasMoreToken, nextImage, nextToken, readMore, setKeywordHandler, setPatternHandler, setReadPositionAbsolute, setReadPositionRelative, setSeparatorHandler, setSequenceHandler, setTokenizerProperties, setWhitespaceHandler
propertyChanged
public StandardTokenizer()
TokenizerProperties.DEFAULT_WHITESPACES
and TokenizerProperties.DEFAULT_SEPARATORS
for whitespace and
separator handling.public StandardTokenizer(TokenizerProperties properties)
StandardTokenizer
with a backing TokenizerProperties
instance.properties
- an TokenizerProperties
object containing the
settings for the tokenizing processpublic int getRangeStart()
Tokenizer
.getRangeStart
in interface Tokenizer
AbstractTokenizer.getReadPosition()
public void setSource(TokenizerSource source)
#de.susebox.jtopas.AbstractTokenizer#setSource
, this method ajusts
the state speicific to the StandardTokenizer
class.setSource
in interface Tokenizer
setSource
in class AbstractTokenizer
source
- a TokenizerSource
to read data fromAbstractTokenizer.getSource()
public void close()
close
in interface Tokenizer
close
in class AbstractTokenizer