public class StandardTokenizerProperties extends AbstractTokenizerProperties implements TokenizerProperties, DataMapper
The class StandardTokenizerProperties provides a simple implementation
of the TokenizerProperties interface for use in most situations.
Note that this class takes advantage of JTopas features that use Java 1.4 or higher. It can still be used in older environments but not compiled with JDK versions below 1.4!
TokenizerProperties,
TokenizerPatternHandler.Result| Modifier and Type | Field and Description |
|---|---|
static int |
CHARFLAG_SEPARATOR
character flag for whitespaces
|
static int |
CHARFLAG_WHITESPACE
character flag for whitespaces
|
static short |
MAX_NONFREE_MATCHLEN
Maximum length of a non-free pattern match.
|
DEFAULT_BLOCK_COMMENT_END, DEFAULT_BLOCK_COMMENT_START, DEFAULT_CHAR_END, DEFAULT_CHAR_ESCAPE, DEFAULT_CHAR_START, DEFAULT_LINE_COMMENT, DEFAULT_SEPARATORS, DEFAULT_STRING_END, DEFAULT_STRING_ESCAPE, DEFAULT_STRING_START, DEFAULT_WHITESPACES| Constructor and Description |
|---|
StandardTokenizerProperties()
Default constructor that intitializes an instance with the default whitespaces
and separator sets.
|
StandardTokenizerProperties(int flags)
This constructor takes the control flags to be used.
|
StandardTokenizerProperties(int flags,
java.lang.String whitespaces,
java.lang.String separators)
This constructor takes the whitespace and separator sets to be used.
|
| Modifier and Type | Method and Description |
|---|---|
int |
countLeadingWhitespaces(DataProvider dataProvider)
This method detects the number of whitespace characters the data range given
through the
DataProvider parameter starts with. |
java.util.Iterator |
getBlockComments()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getKeywords()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getLineComments()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getPatterns()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getProperties()
This method returns an
Iterator of TokenizerProperty
objects. |
java.lang.String |
getSeparators()
Obtaining the separator set of the
Tokenizer. |
int |
getSequenceMaxLength()
This method returns the length of the longest special sequence, comment or
string prefix that is known to this
SequenceHandler. |
java.util.Iterator |
getSpecialSequences()
This method returns an
Iterator of TokenizerProperty
objects. |
java.util.Iterator |
getStrings()
This method returns an
Iterator of TokenizerProperty
objects. |
TokenizerProperties |
getTokenizerProperties()
The method retrieves the backing
TokenizerProperties
instance, this DataMapper is working on. |
java.lang.String |
getWhitespaces()
Obtaining the whitespace character set.
|
boolean |
hasKeywords()
This method can be used by a
Tokenizer implementation
for a fast detection if keyword matching must be performed at all. |
boolean |
hasPattern()
This method can be used by a
Tokenizer implementation
for a fast detection if pattern matching must be performed at all. |
boolean |
hasSequenceCommentOrString()
This method can be used by a
Tokenizer implementation
for a fast detection if special sequence checking must be performed at all. |
TokenizerProperty |
isKeyword(DataProvider dataProvider)
This method checks if the character range given through the
DataProvider comprises a keyword. |
boolean |
isSeparator(char testChar)
This method checks the given character if it is a separator.
|
boolean |
isWhitespace(char testChar)
This method checks if the character is a whitespace.
|
PatternHandler.Result |
matches(DataProvider dataProvider)
This method checks if the start of a character range given through the
DataProvider matches a pattern. |
boolean |
newlineIsWhitespace()
If a
Tokenizer performs line counting, it is often nessecary to
know if newline characters is considered to be a whitespace. |
void |
setTokenizerProperties(TokenizerProperties props)
Setting the backing
TokenizerProperties instance this DataMapper
is working with. |
TokenizerProperty |
startsWithSequenceCommentOrString(DataProvider dataProvider)
This method checks if a given range of data starts with a special sequence,
a comment or a string.
|
addBlockComment, addBlockComment, addBlockComment, addBlockComment, addKeyword, addKeyword, addKeyword, addKeyword, addLineComment, addLineComment, addLineComment, addLineComment, addPattern, addPattern, addPattern, addPattern, addProperty, addSeparators, addSpecialSequence, addSpecialSequence, addSpecialSequence, addSpecialSequence, addString, addString, addString, addString, addTokenizerPropertyListener, addWhitespaces, blockCommentExists, getBlockComment, getBlockCommentCompanion, getKeyword, getKeywordCompanion, getLineComment, getLineCommentCompanion, getParseFlags, getPattern, getPatternCompanion, getSpecialSequence, getSpecialSequenceCompanion, getString, getStringCompanion, isFlagSet, isFlagSet, keywordExists, lineCommentExists, patternExists, propertyExists, removeBlockComment, removeKeyword, removeLineComment, removePattern, removeProperty, removeSeparators, removeSpecialSequence, removeString, removeTokenizerPropertyListener, removeWhitespaces, setParseFlags, setSeparators, setWhitespaces, specialSequenceExists, stringExistsequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitaddBlockComment, addBlockComment, addBlockComment, addBlockComment, addKeyword, addKeyword, addKeyword, addKeyword, addLineComment, addLineComment, addLineComment, addLineComment, addPattern, addPattern, addPattern, addPattern, addProperty, addSeparators, addSpecialSequence, addSpecialSequence, addSpecialSequence, addSpecialSequence, addString, addString, addString, addString, addTokenizerPropertyListener, addWhitespaces, blockCommentExists, getBlockComment, getBlockCommentCompanion, getKeyword, getKeywordCompanion, getLineComment, getLineCommentCompanion, getParseFlags, getPattern, getPatternCompanion, getSpecialSequence, getSpecialSequenceCompanion, getString, getStringCompanion, isFlagSet, isFlagSet, keywordExists, lineCommentExists, patternExists, propertyExists, removeBlockComment, removeKeyword, removeLineComment, removePattern, removeProperty, removeSeparators, removeSpecialSequence, removeString, removeTokenizerPropertyListener, removeWhitespaces, setParseFlags, setSeparators, setWhitespaces, specialSequenceExists, stringExistspublic static final short MAX_NONFREE_MATCHLEN
TokenizerProperties#F_FREE_PATTERN flag set. A common
example are number patterns.public static final int CHARFLAG_WHITESPACE
public static final int CHARFLAG_SEPARATOR
public StandardTokenizerProperties()
Tokenizer instances using this StandardTokenizerProperties
object, split text between spaces, tabs and line ending sequences as well
as between punctuation characters.public StandardTokenizerProperties(int flags)
TokenizerProperties props = new StandardTokenizerProperties(); props.setParseFlags(flags);See the
TokenizerProperties interface for the supported flags.
TokenizerProperties.DEFAULT_WHITESPACES and
TokenizerProperties.DEFAULT_SEPARATORS are used for whitespace and
separator handling if no explicit calls to AbstractTokenizerProperties.setWhitespaces(java.lang.String) and
AbstractTokenizerProperties.setSeparators(java.lang.String) will follow subsequently.flags - tokenizer control flagsAbstractTokenizerProperties.setParseFlags(int)public StandardTokenizerProperties(int flags,
java.lang.String whitespaces,
java.lang.String separators)
TokenizerProperties props = new StandardTokenizerProperties(); props.setWhitespaces(ws); props.setSeparators(sep);
flags - tokenizer control flagswhitespaces - the whitespace setseparators - the set of separating charactersAbstractTokenizerProperties.setParseFlags(int),
AbstractTokenizerProperties.setWhitespaces(java.lang.String),
AbstractTokenizerProperties.setSeparators(java.lang.String)public java.util.Iterator getStrings()
Iterator of TokenizerProperty
objects. See the method description in TokenizerProperties.getStrings in interface TokenizerPropertiesTokenizerProperty objectspublic java.lang.String getWhitespaces()
TokenizerProperties.getWhitespaces in interface TokenizerPropertiesAbstractTokenizerProperties.setWhitespaces(java.lang.String)public java.lang.String getSeparators()
Tokenizer.
See the method description in TokenizerProperties.getSeparators in interface TokenizerPropertiesAbstractTokenizerProperties.setSeparators(java.lang.String)public java.util.Iterator getLineComments()
Iterator of TokenizerProperty
objects.
See the method description in TokenizerProperties.getLineComments in interface TokenizerPropertiesTokenizerProperty objectspublic java.util.Iterator getBlockComments()
Iterator of TokenizerProperty
objects.
See the method description in TokenizerProperties.getBlockComments in interface TokenizerPropertiesTokenizerProperty objectspublic java.util.Iterator getSpecialSequences()
Iterator of TokenizerProperty
objects.
See the method description in TokenizerProperties.getSpecialSequences in interface TokenizerPropertiesTokenizerProperty objectspublic java.util.Iterator getKeywords()
Iterator of TokenizerProperty
objects.
See the method description in TokenizerProperties.getKeywords in interface TokenizerPropertiesTokenizerProperty objectspublic java.util.Iterator getPatterns()
Iterator of TokenizerProperty
objects. Each TokenizerProperty object contains a pattern and
its companion if such an associated object exists.getPatterns in interface TokenizerPropertiesTokenizerProperty objectspublic java.util.Iterator getProperties()
Iterator of TokenizerProperty
objects.
See the method description in TokenizerProperties.getProperties in interface TokenizerPropertiesTokenizerProperty objectspublic void setTokenizerProperties(TokenizerProperties props) throws java.lang.UnsupportedOperationException, java.lang.NullPointerException
TokenizerProperties instance this DataMapper
is working with. Usually, the DataMapper
interface is implemented by TokenizerProperties implementations,
too. Otherwise the Tokenizer using the TokenizerProperties,
will construct a default DataMapper an propagate the
TokenizerProperties instance by calling this method.
UnsupportedOperationException
if this DataMapper is an extension to an TokenizerProperties
implementation.setTokenizerProperties in interface DataMapperprops - the TokenizerPropertiesjava.lang.UnsupportedOperationException - is this is a DataMapper
implemented by a TokenizerProperties
implementationjava.lang.NullPointerException - if no TokenizerProperties are givenpublic TokenizerProperties getTokenizerProperties()
TokenizerProperties
instance, this DataMapper is working on. For implementations
of the TokenizerProperties interface that also implement the
DataMapper interface, this method returns the instance itself
it is called on.
TokenizerProperties instance
passed through the last call to setTokenizerProperties(de.susebox.jtopas.TokenizerProperties) or null
if no such call has taken place so far.getTokenizerProperties in interface DataMapperTokenizerProperties or nullpublic boolean isWhitespace(char testChar)
isWhitespace in interface WhitespaceHandlertestChar - check this charactertrue if the given character is a whitespace,
false otherwiseTokenizerProperties.setWhitespaces(java.lang.String)public int countLeadingWhitespaces(DataProvider dataProvider) throws java.lang.NullPointerException
DataProvider parameter starts with.countLeadingWhitespaces in interface WhitespaceHandlerdataProvider - the source to get the data range fromTokenizerException - failure while reading data from the input streamjava.lang.NullPointerException - if no DataProvider is givenDataProviderpublic boolean newlineIsWhitespace()
Tokenizer performs line counting, it is often nessecary to
know if newline characters is considered to be a whitespace. See WhitespaceHandler
for details.newlineIsWhitespace in interface WhitespaceHandlertrue if newline characters are in the current whitespace set,
false otherwisepublic boolean isSeparator(char testChar)
isSeparator in interface SeparatorHandlertestChar - check this charactertrue if the given character is a separator,
false otherwiseTokenizerProperties.setSeparators(java.lang.String)public boolean hasSequenceCommentOrString()
Tokenizer implementation
for a fast detection if special sequence checking must be performed at all.
If the method returns false time-consuming preparations can be
skipped.hasSequenceCommentOrString in interface SequenceHandlertrue if there actually are pattern that can be tested
for a match, false otherwise.public TokenizerProperty startsWithSequenceCommentOrString(DataProvider dataProvider) throws TokenizerException, java.lang.NullPointerException
null if no special sequence, comment or string
could matches the the leading part of the data range given through the
DataProvider.
Tokenizer.startsWithSequenceCommentOrString in interface SequenceHandlerdataProvider - the source to get the data range fromTokenizerProperty if a special sequence,
comment or string could be detected, null otherwiseTokenizerException - failure while reading more datajava.lang.NullPointerException - if no DataProvider is givenpublic int getSequenceMaxLength()
SequenceHandler. When
calling startsWithSequenceCommentOrString(de.susebox.jtopas.spi.DataProvider), the passed DataProvider
parameter will supply at least this number of characters (see DataProvider.getLength()).
If less characters are provided, EOF is reached.getSequenceMaxLength in interface SequenceHandlerpublic boolean hasKeywords()
Tokenizer implementation
for a fast detection if keyword matching must be performed at all. If the method
returns false time-consuming preparations can be skipped.hasKeywords in interface KeywordHandlertrue if there actually are pattern that can be tested
for a match, false otherwise.public TokenizerProperty isKeyword(DataProvider dataProvider) throws TokenizerException, java.lang.NullPointerException
DataProvider comprises a keyword.isKeyword in interface KeywordHandlerdataProvider - the source to get the data from, that are checkedTokenizerProperty if a keyword could be
found, null otherwiseTokenizerException - failure while reading more datajava.lang.NullPointerException - if no DataProvider is givenpublic boolean hasPattern()
Tokenizer implementation
for a fast detection if pattern matching must be performed at all. If the method
returns false time-consuming preparations can be skipped.hasPattern in interface PatternHandlertrue if there actually are pattern that can be tested
for a match, false otherwise.public PatternHandler.Result matches(DataProvider dataProvider) throws TokenizerException, java.lang.NullPointerException
DataProvider matches a pattern.matches in interface PatternHandlerdataProvider - the source to get the data fromPatternHandler.Result object or null if no
match was foundTokenizerException - generic exceptionjava.lang.NullPointerException - if no DataProvider is given