public class StandardTokenizerProperties extends AbstractTokenizerProperties implements TokenizerProperties, DataMapper
The class StandardTokenizerProperties
provides a simple implementation
of the TokenizerProperties
interface for use in most situations.
Note that this class takes advantage of JTopas features that use Java 1.4 or higher. It can still be used in older environments but not compiled with JDK versions below 1.4!
TokenizerProperties
,
Tokenizer
PatternHandler.Result
Modifier and Type | Field and Description |
---|---|
static int |
CHARFLAG_SEPARATOR
character flag for whitespaces
|
static int |
CHARFLAG_WHITESPACE
character flag for whitespaces
|
static short |
MAX_NONFREE_MATCHLEN
Maximum length of a non-free pattern match.
|
DEFAULT_BLOCK_COMMENT_END, DEFAULT_BLOCK_COMMENT_START, DEFAULT_CHAR_END, DEFAULT_CHAR_ESCAPE, DEFAULT_CHAR_START, DEFAULT_LINE_COMMENT, DEFAULT_SEPARATORS, DEFAULT_STRING_END, DEFAULT_STRING_ESCAPE, DEFAULT_STRING_START, DEFAULT_WHITESPACES
Constructor and Description |
---|
StandardTokenizerProperties()
Default constructor that intitializes an instance with the default whitespaces
and separator sets.
|
StandardTokenizerProperties(int flags)
This constructor takes the control flags to be used.
|
StandardTokenizerProperties(int flags,
java.lang.String whitespaces,
java.lang.String separators)
This constructor takes the whitespace and separator sets to be used.
|
Modifier and Type | Method and Description |
---|---|
int |
countLeadingWhitespaces(DataProvider dataProvider)
This method detects the number of whitespace characters the data range given
through the
DataProvider parameter starts with. |
java.util.Iterator |
getBlockComments()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getKeywords()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getLineComments()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getPatterns()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getProperties()
This method returns an
Iterator of TokenizerProperty
objects. |
java.lang.String |
getSeparators()
Obtaining the separator set of the
Tokenizer . |
int |
getSequenceMaxLength()
This method returns the length of the longest special sequence, comment or
string prefix that is known to this
SequenceHandler . |
java.util.Iterator |
getSpecialSequences()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getStrings()
This method returns an
Iterator of TokenizerProperty
objects. |
TokenizerProperties |
getTokenizerProperties()
The method retrieves the backing
TokenizerProperties
instance, this DataMapper is working on. |
java.lang.String |
getWhitespaces()
Obtaining the whitespace character set.
|
boolean |
hasKeywords()
This method can be used by a
Tokenizer implementation
for a fast detection if keyword matching must be performed at all. |
boolean |
hasPattern()
This method can be used by a
Tokenizer implementation
for a fast detection if pattern matching must be performed at all. |
boolean |
hasSequenceCommentOrString()
This method can be used by a
Tokenizer implementation
for a fast detection if special sequence checking must be performed at all. |
TokenizerProperty |
isKeyword(DataProvider dataProvider)
This method checks if the character range given through the
DataProvider comprises a keyword. |
boolean |
isSeparator(char testChar)
This method checks the given character if it is a separator.
|
boolean |
isWhitespace(char testChar)
This method checks if the character is a whitespace.
|
PatternHandler.Result |
matches(DataProvider dataProvider)
This method checks if the start of a character range given through the
DataProvider matches a pattern. |
boolean |
newlineIsWhitespace()
If a
Tokenizer performs line counting, it is often nessecary to
know if newline characters is considered to be a whitespace. |
void |
setTokenizerProperties(TokenizerProperties props)
Setting the backing
TokenizerProperties instance this DataMapper
is working with. |
TokenizerProperty |
startsWithSequenceCommentOrString(DataProvider dataProvider)
This method checks if a given range of data starts with a special sequence,
a comment or a string.
|
addBlockComment, addBlockComment, addBlockComment, addBlockComment, addKeyword, addKeyword, addKeyword, addKeyword, addLineComment, addLineComment, addLineComment, addLineComment, addPattern, addPattern, addPattern, addPattern, addProperty, addSeparators, addSpecialSequence, addSpecialSequence, addSpecialSequence, addSpecialSequence, addString, addString, addString, addString, addTokenizerPropertyListener, addWhitespaces, blockCommentExists, getBlockComment, getBlockCommentCompanion, getKeyword, getKeywordCompanion, getLineComment, getLineCommentCompanion, getParseFlags, getPattern, getPatternCompanion, getSpecialSequence, getSpecialSequenceCompanion, getString, getStringCompanion, isFlagSet, isFlagSet, keywordExists, lineCommentExists, patternExists, propertyExists, removeBlockComment, removeKeyword, removeLineComment, removePattern, removeProperty, removeSeparators, removeSpecialSequence, removeString, removeTokenizerPropertyListener, removeWhitespaces, setParseFlags, setSeparators, setWhitespaces, specialSequenceExists, stringExists
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addBlockComment, addBlockComment, addBlockComment, addBlockComment, addKeyword, addKeyword, addKeyword, addKeyword, addLineComment, addLineComment, addLineComment, addLineComment, addPattern, addPattern, addPattern, addPattern, addProperty, addSeparators, addSpecialSequence, addSpecialSequence, addSpecialSequence, addSpecialSequence, addString, addString, addString, addString, addTokenizerPropertyListener, addWhitespaces, blockCommentExists, getBlockComment, getBlockCommentCompanion, getKeyword, getKeywordCompanion, getLineComment, getLineCommentCompanion, getParseFlags, getPattern, getPatternCompanion, getSpecialSequence, getSpecialSequenceCompanion, getString, getStringCompanion, isFlagSet, isFlagSet, keywordExists, lineCommentExists, patternExists, propertyExists, removeBlockComment, removeKeyword, removeLineComment, removePattern, removeProperty, removeSeparators, removeSpecialSequence, removeString, removeTokenizerPropertyListener, removeWhitespaces, setParseFlags, setSeparators, setWhitespaces, specialSequenceExists, stringExists
public static final short MAX_NONFREE_MATCHLEN
TokenizerProperties#F_FREE_PATTERN
flag set. A common
example are number patterns.public static final int CHARFLAG_WHITESPACE
public static final int CHARFLAG_SEPARATOR
public StandardTokenizerProperties()
Tokenizer
instances using this StandardTokenizerProperties
object, split text between spaces, tabs and line ending sequences as well
as between punctuation characters.public StandardTokenizerProperties(int flags)
TokenizerProperties props = new StandardTokenizerProperties(); props.setParseFlags(flags);See the
TokenizerProperties
interface for the supported flags.
TokenizerProperties.DEFAULT_WHITESPACES
and
TokenizerProperties.DEFAULT_SEPARATORS
are used for whitespace and
separator handling if no explicit calls to AbstractTokenizerProperties.setWhitespaces(java.lang.String)
and
AbstractTokenizerProperties.setSeparators(java.lang.String)
will follow subsequently.flags
- tokenizer control flagsAbstractTokenizerProperties.setParseFlags(int)
public StandardTokenizerProperties(int flags, java.lang.String whitespaces, java.lang.String separators)
TokenizerProperties props = new StandardTokenizerProperties(); props.setWhitespaces(ws); props.setSeparators(sep);
flags
- tokenizer control flagswhitespaces
- the whitespace setseparators
- the set of separating charactersAbstractTokenizerProperties.setParseFlags(int)
,
AbstractTokenizerProperties.setWhitespaces(java.lang.String)
,
AbstractTokenizerProperties.setSeparators(java.lang.String)
public java.util.Iterator getStrings()
Iterator
of TokenizerProperty
objects. See the method description in TokenizerProperties
.getStrings
in interface TokenizerProperties
TokenizerProperty
objectspublic java.lang.String getWhitespaces()
TokenizerProperties
.getWhitespaces
in interface TokenizerProperties
AbstractTokenizerProperties.setWhitespaces(java.lang.String)
public java.lang.String getSeparators()
Tokenizer
.
See the method description in TokenizerProperties
.getSeparators
in interface TokenizerProperties
AbstractTokenizerProperties.setSeparators(java.lang.String)
public java.util.Iterator getLineComments()
Iterator
of TokenizerProperty
objects.
See the method description in TokenizerProperties
.getLineComments
in interface TokenizerProperties
TokenizerProperty
objectspublic java.util.Iterator getBlockComments()
Iterator
of TokenizerProperty
objects.
See the method description in TokenizerProperties
.getBlockComments
in interface TokenizerProperties
TokenizerProperty
objectspublic java.util.Iterator getSpecialSequences()
Iterator
of TokenizerProperty
objects.
See the method description in TokenizerProperties
.getSpecialSequences
in interface TokenizerProperties
TokenizerProperty
objectspublic java.util.Iterator getKeywords()
Iterator
of TokenizerProperty
objects.
See the method description in TokenizerProperties
.getKeywords
in interface TokenizerProperties
TokenizerProperty
objectspublic java.util.Iterator getPatterns()
Iterator
of TokenizerProperty
objects. Each TokenizerProperty
object contains a pattern and
its companion if such an associated object exists.getPatterns
in interface TokenizerProperties
TokenizerProperty
objectspublic java.util.Iterator getProperties()
Iterator
of TokenizerProperty
objects.
See the method description in TokenizerProperties
.getProperties
in interface TokenizerProperties
TokenizerProperty
objectspublic void setTokenizerProperties(TokenizerProperties props) throws java.lang.UnsupportedOperationException, java.lang.NullPointerException
TokenizerProperties
instance this DataMapper
is working with. Usually, the DataMapper
interface is implemented by TokenizerProperties
implementations,
too. Otherwise the Tokenizer
using the TokenizerProperties
,
will construct a default DataMapper
an propagate the
TokenizerProperties
instance by calling this method.
UnsupportedOperationException
if this DataMapper
is an extension to an TokenizerProperties
implementation.setTokenizerProperties
in interface DataMapper
props
- the TokenizerProperties
java.lang.UnsupportedOperationException
- is this is a DataMapper
implemented by a TokenizerProperties
implementationjava.lang.NullPointerException
- if no TokenizerProperties
are givenpublic TokenizerProperties getTokenizerProperties()
TokenizerProperties
instance, this DataMapper
is working on. For implementations
of the TokenizerProperties
interface that also implement the
DataMapper
interface, this method returns the instance itself
it is called on.
TokenizerProperties
instance
passed through the last call to setTokenizerProperties(de.susebox.jtopas.TokenizerProperties)
or null
if no such call has taken place so far.getTokenizerProperties
in interface DataMapper
TokenizerProperties
or null
public boolean isWhitespace(char testChar)
isWhitespace
in interface WhitespaceHandler
testChar
- check this charactertrue
if the given character is a whitespace,
false
otherwiseTokenizerProperties.setWhitespaces(java.lang.String)
public int countLeadingWhitespaces(DataProvider dataProvider) throws java.lang.NullPointerException
DataProvider
parameter starts with.countLeadingWhitespaces
in interface WhitespaceHandler
dataProvider
- the source to get the data range fromTokenizerException
- failure while reading data from the input streamjava.lang.NullPointerException
- if no DataProvider
is givenDataProvider
public boolean newlineIsWhitespace()
Tokenizer
performs line counting, it is often nessecary to
know if newline characters is considered to be a whitespace. See WhitespaceHandler
for details.newlineIsWhitespace
in interface WhitespaceHandler
true
if newline characters are in the current whitespace set,
false
otherwisepublic boolean isSeparator(char testChar)
isSeparator
in interface SeparatorHandler
testChar
- check this charactertrue
if the given character is a separator,
false
otherwiseTokenizerProperties.setSeparators(java.lang.String)
public boolean hasSequenceCommentOrString()
Tokenizer
implementation
for a fast detection if special sequence checking must be performed at all.
If the method returns false
time-consuming preparations can be
skipped.hasSequenceCommentOrString
in interface SequenceHandler
true
if there actually are pattern that can be tested
for a match, false
otherwise.public TokenizerProperty startsWithSequenceCommentOrString(DataProvider dataProvider) throws TokenizerException, java.lang.NullPointerException
null
if no special sequence, comment or string
could matches the the leading part of the data range given through the
DataProvider
.
Tokenizer
.startsWithSequenceCommentOrString
in interface SequenceHandler
dataProvider
- the source to get the data range fromTokenizerProperty
if a special sequence,
comment or string could be detected, null
otherwiseTokenizerException
- failure while reading more datajava.lang.NullPointerException
- if no DataProvider
is givenpublic int getSequenceMaxLength()
SequenceHandler
. When
calling startsWithSequenceCommentOrString(de.susebox.jtopas.spi.DataProvider)
, the passed DataProvider
parameter will supply at least this number of characters (see DataProvider.getLength()
).
If less characters are provided, EOF is reached.getSequenceMaxLength
in interface SequenceHandler
public boolean hasKeywords()
Tokenizer
implementation
for a fast detection if keyword matching must be performed at all. If the method
returns false
time-consuming preparations can be skipped.hasKeywords
in interface KeywordHandler
true
if there actually are pattern that can be tested
for a match, false
otherwise.public TokenizerProperty isKeyword(DataProvider dataProvider) throws TokenizerException, java.lang.NullPointerException
DataProvider
comprises a keyword.isKeyword
in interface KeywordHandler
dataProvider
- the source to get the data from, that are checkedTokenizerProperty
if a keyword could be
found, null
otherwiseTokenizerException
- failure while reading more datajava.lang.NullPointerException
- if no DataProvider
is givenpublic boolean hasPattern()
Tokenizer
implementation
for a fast detection if pattern matching must be performed at all. If the method
returns false
time-consuming preparations can be skipped.hasPattern
in interface PatternHandler
true
if there actually are pattern that can be tested
for a match, false
otherwise.public PatternHandler.Result matches(DataProvider dataProvider) throws TokenizerException, java.lang.NullPointerException
DataProvider
matches a pattern.matches
in interface PatternHandler
dataProvider
- the source to get the data fromPatternHandler.Result
object or null
if no
match was foundTokenizerException
- generic exceptionjava.lang.NullPointerException
- if no DataProvider
is given