StandardTokenizerProperties

java.lang.Object
- de.susebox.jtopas.AbstractTokenizerProperties
- - de.susebox.jtopas.StandardTokenizerProperties

All Implemented Interfaces:

DataMapper, KeywordHandler, PatternHandler, SeparatorHandler, SequenceHandler, WhitespaceHandler, TokenizerProperties
```
public class StandardTokenizerProperties
extends AbstractTokenizerProperties
implements TokenizerProperties, DataMapper
```
The class StandardTokenizerProperties provides a simple implementation of the TokenizerProperties interface for use in most situations.
Note that this class takes advantage of JTopas features that use Java 1.4 or higher. It can still be used in older environments but not compiled with JDK versions below 1.4!

Author:

Heiko Blau

See Also:

TokenizerProperties, Tokenizer

Nested Class Summary
- Nested classes/interfaces inherited from interface de.susebox.jtopas.spi.PatternHandler
  PatternHandler.Result

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`CHARFLAG_SEPARATOR` character flag for whitespaces
`static int`	`CHARFLAG_WHITESPACE` character flag for whitespaces
`static short`	`MAX_NONFREE_MATCHLEN` Maximum length of a non-free pattern match.

Fields inherited from interface de.susebox.jtopas.TokenizerProperties
DEFAULT_BLOCK_COMMENT_END, DEFAULT_BLOCK_COMMENT_START, DEFAULT_CHAR_END, DEFAULT_CHAR_ESCAPE, DEFAULT_CHAR_START, DEFAULT_LINE_COMMENT, DEFAULT_SEPARATORS, DEFAULT_STRING_END, DEFAULT_STRING_ESCAPE, DEFAULT_STRING_START, DEFAULT_WHITESPACES

Constructor Summary

Constructors
Constructor and Description
`StandardTokenizerProperties()` Default constructor that intitializes an instance with the default whitespaces and separator sets.
`StandardTokenizerProperties(int flags)` This constructor takes the control flags to be used.
`StandardTokenizerProperties(int flags, java.lang.String whitespaces, java.lang.String separators)` This constructor takes the whitespace and separator sets to be used.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int`	`countLeadingWhitespaces(DataProvider dataProvider)` This method detects the number of whitespace characters the data range given through the `DataProvider` parameter starts with.
`java.util.Iterator`	`getBlockComments()` This method returns an `Iterator` of `TokenizerProperty` objects.
`java.util.Iterator`	`getKeywords()` This method returns an `Iterator` of `TokenizerProperty` objects.
`java.util.Iterator`	`getLineComments()` This method returns an `Iterator` of `TokenizerProperty` objects.
`java.util.Iterator`	`getPatterns()` This method returns an `Iterator` of `TokenizerProperty` objects.
`java.util.Iterator`	`getProperties()` This method returns an `Iterator` of `TokenizerProperty` objects.
`java.lang.String`	`getSeparators()` Obtaining the separator set of the `Tokenizer`.
`int`	`getSequenceMaxLength()` This method returns the length of the longest special sequence, comment or string prefix that is known to this `SequenceHandler`.
`java.util.Iterator`	`getSpecialSequences()` This method returns an `Iterator` of `TokenizerProperty` objects.
`java.util.Iterator`	`getStrings()` This method returns an `Iterator` of `TokenizerProperty` objects.
`TokenizerProperties`	`getTokenizerProperties()` The method retrieves the backing `TokenizerProperties` instance, this `DataMapper` is working on.
`java.lang.String`	`getWhitespaces()` Obtaining the whitespace character set.
`boolean`	`hasKeywords()` This method can be used by a `Tokenizer` implementation for a fast detection if keyword matching must be performed at all.
`boolean`	`hasPattern()` This method can be used by a `Tokenizer` implementation for a fast detection if pattern matching must be performed at all.
`boolean`	`hasSequenceCommentOrString()` This method can be used by a `Tokenizer` implementation for a fast detection if special sequence checking must be performed at all.
`TokenizerProperty`	`isKeyword(DataProvider dataProvider)` This method checks if the character range given through the `DataProvider` comprises a keyword.
`boolean`	`isSeparator(char testChar)` This method checks the given character if it is a separator.
`boolean`	`isWhitespace(char testChar)` This method checks if the character is a whitespace.
`PatternHandler.Result`	`matches(DataProvider dataProvider)` This method checks if the start of a character range given through the `DataProvider` matches a pattern.
`boolean`	`newlineIsWhitespace()` If a `Tokenizer` performs line counting, it is often nessecary to know if newline characters is considered to be a whitespace.
`void`	`setTokenizerProperties(TokenizerProperties props)` Setting the backing `TokenizerProperties` instance this `DataMapper` is working with.
`TokenizerProperty`	`startsWithSequenceCommentOrString(DataProvider dataProvider)` This method checks if a given range of data starts with a special sequence, a comment or a string.

Methods inherited from class de.susebox.jtopas.AbstractTokenizerProperties
addBlockComment, addBlockComment, addBlockComment, addBlockComment, addKeyword, addKeyword, addKeyword, addKeyword, addLineComment, addLineComment, addLineComment, addLineComment, addPattern, addPattern, addPattern, addPattern, addProperty, addSeparators, addSpecialSequence, addSpecialSequence, addSpecialSequence, addSpecialSequence, addString, addString, addString, addString, addTokenizerPropertyListener, addWhitespaces, blockCommentExists, getBlockComment, getBlockCommentCompanion, getKeyword, getKeywordCompanion, getLineComment, getLineCommentCompanion, getParseFlags, getPattern, getPatternCompanion, getSpecialSequence, getSpecialSequenceCompanion, getString, getStringCompanion, isFlagSet, isFlagSet, keywordExists, lineCommentExists, patternExists, propertyExists, removeBlockComment, removeKeyword, removeLineComment, removePattern, removeProperty, removeSeparators, removeSpecialSequence, removeString, removeTokenizerPropertyListener, removeWhitespaces, setParseFlags, setSeparators, setWhitespaces, specialSequenceExists, stringExists

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface de.susebox.jtopas.TokenizerProperties
addBlockComment, addBlockComment, addBlockComment, addBlockComment, addKeyword, addKeyword, addKeyword, addKeyword, addLineComment, addLineComment, addLineComment, addLineComment, addPattern, addPattern, addPattern, addPattern, addProperty, addSeparators, addSpecialSequence, addSpecialSequence, addSpecialSequence, addSpecialSequence, addString, addString, addString, addString, addTokenizerPropertyListener, addWhitespaces, blockCommentExists, getBlockComment, getBlockCommentCompanion, getKeyword, getKeywordCompanion, getLineComment, getLineCommentCompanion, getParseFlags, getPattern, getPatternCompanion, getSpecialSequence, getSpecialSequenceCompanion, getString, getStringCompanion, isFlagSet, isFlagSet, keywordExists, lineCommentExists, patternExists, propertyExists, removeBlockComment, removeKeyword, removeLineComment, removePattern, removeProperty, removeSeparators, removeSpecialSequence, removeString, removeTokenizerPropertyListener, removeWhitespaces, setParseFlags, setSeparators, setWhitespaces, specialSequenceExists, stringExists

- Field Detail
  - MAX_NONFREE_MATCHLEN
```
public static final short MAX_NONFREE_MATCHLEN
```
    Maximum length of a non-free pattern match. These are patterns that dont have the TokenizerProperties#F_FREE_PATTERN flag set. A common example are number patterns.
    
    See Also:
    
    Constant Field Values
  - CHARFLAG_WHITESPACE
```
public static final int CHARFLAG_WHITESPACE
```
    character flag for whitespaces
    
    See Also:
    
    Constant Field Values
  - CHARFLAG_SEPARATOR
```
public static final int CHARFLAG_SEPARATOR
```
    character flag for whitespaces
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - StandardTokenizerProperties
```
public StandardTokenizerProperties()
```
    Default constructor that intitializes an instance with the default whitespaces and separator sets. Tokenizer instances using this StandardTokenizerProperties object, split text between spaces, tabs and line ending sequences as well as between punctuation characters.
  - StandardTokenizerProperties
```
public StandardTokenizerProperties(int flags)
```
    This constructor takes the control flags to be used. It is a shortcut to:
```
   TokenizerProperties props = new StandardTokenizerProperties();

   props.setParseFlags(flags);
 
```
    See the TokenizerProperties interface for the supported flags.
    The TokenizerProperties.DEFAULT_WHITESPACES and TokenizerProperties.DEFAULT_SEPARATORS are used for whitespace and separator handling if no explicit calls to AbstractTokenizerProperties.setWhitespaces(java.lang.String) and AbstractTokenizerProperties.setSeparators(java.lang.String) will follow subsequently.
    Parameters:
    
    flags - tokenizer control flags
    
    See Also:
    
    AbstractTokenizerProperties.setParseFlags(int)
  - StandardTokenizerProperties
```
public StandardTokenizerProperties(int flags,
                                   java.lang.String whitespaces,
                                   java.lang.String separators)
```
    This constructor takes the whitespace and separator sets to be used. It is a shortcut to:
```
   TokenizerProperties props = new StandardTokenizerProperties();

   props.setWhitespaces(ws);
   props.setSeparators(sep);
 
```
    Parameters:
    
    flags - tokenizer control flags
    
    whitespaces - the whitespace set
    
    separators - the set of separating characters
    
    See Also:
    
    AbstractTokenizerProperties.setParseFlags(int), AbstractTokenizerProperties.setWhitespaces(java.lang.String), AbstractTokenizerProperties.setSeparators(java.lang.String)
- Method Detail
  - getStrings
```
public java.util.Iterator getStrings()
```
    This method returns an Iterator of TokenizerProperty objects. See the method description in TokenizerProperties.
    
    Specified by:
    
    getStrings in interface TokenizerProperties
    
    Returns:
    
    enumeration of TokenizerProperty objects
  - getWhitespaces
```
public java.lang.String getWhitespaces()
```
    Obtaining the whitespace character set. See the method description in TokenizerProperties.
    
    Specified by:
    
    getWhitespaces in interface TokenizerProperties
    
    Returns:
    
    the currently active whitespace set
    
    See Also:
    
    AbstractTokenizerProperties.setWhitespaces(java.lang.String)
  - getSeparators
```
public java.lang.String getSeparators()
```
    Obtaining the separator set of the Tokenizer. See the method description in TokenizerProperties.
    
    Specified by:
    
    getSeparators in interface TokenizerProperties
    
    Returns:
    
    the currently used set of separating characters
    
    See Also:
    
    AbstractTokenizerProperties.setSeparators(java.lang.String)
  - getLineComments
```
public java.util.Iterator getLineComments()
```
    This method returns an Iterator of TokenizerProperty objects. See the method description in TokenizerProperties.
    
    Specified by:
    
    getLineComments in interface TokenizerProperties
    
    Returns:
    
    enumeration of TokenizerProperty objects
  - getBlockComments
```
public java.util.Iterator getBlockComments()
```
    This method returns an Iterator of TokenizerProperty objects. See the method description in TokenizerProperties.
    
    Specified by:
    
    getBlockComments in interface TokenizerProperties
    
    Returns:
    
    enumeration of TokenizerProperty objects
  - getSpecialSequences
```
public java.util.Iterator getSpecialSequences()
```
    This method returns an Iterator of TokenizerProperty objects. See the method description in TokenizerProperties.
    
    Specified by:
    
    getSpecialSequences in interface TokenizerProperties
    
    Returns:
    
    enumeration of TokenizerProperty objects
  - getKeywords
```
public java.util.Iterator getKeywords()
```
    This method returns an Iterator of TokenizerProperty objects. See the method description in TokenizerProperties.
    
    Specified by:
    
    getKeywords in interface TokenizerProperties
    
    Returns:
    
    iteration of TokenizerProperty objects
  - getPatterns
```
public java.util.Iterator getPatterns()
```
    This method returns an Iterator of TokenizerProperty objects. Each TokenizerProperty object contains a pattern and its companion if such an associated object exists.
    
    Specified by:
    
    getPatterns in interface TokenizerProperties
    
    Returns:
    
    enumeration of TokenizerProperty objects
  - getProperties
```
public java.util.Iterator getProperties()
```
    This method returns an Iterator of TokenizerProperty objects. See the method description in TokenizerProperties.
    
    Specified by:
    
    getProperties in interface TokenizerProperties
    
    Returns:
    
    enumeration of TokenizerProperty objects
  - setTokenizerProperties
```
public void setTokenizerProperties(TokenizerProperties props)
                            throws java.lang.UnsupportedOperationException,
                                   java.lang.NullPointerException
```
    Setting the backing TokenizerProperties instance this DataMapper is working with. Usually, the DataMapper interface is implemented by TokenizerProperties implementations, too. Otherwise the Tokenizer using the TokenizerProperties, will construct a default DataMapper an propagate the TokenizerProperties instance by calling this method.
    The method should throw an UnsupportedOperationException if this DataMapper is an extension to an TokenizerProperties implementation.
    
    Specified by:
    
    setTokenizerProperties in interface DataMapper
    
    Parameters:
    
    props - the TokenizerProperties
    
    Throws:
    
    java.lang.UnsupportedOperationException - is this is a DataMapper implemented by a TokenizerProperties implementation
    
    java.lang.NullPointerException - if no TokenizerProperties are given
  - getTokenizerProperties
```
public TokenizerProperties getTokenizerProperties()
```
    The method retrieves the backing TokenizerProperties instance, this DataMapper is working on. For implementations of the TokenizerProperties interface that also implement the DataMapper interface, this method returns the instance itself it is called on.
    Otherwise the method returns the TokenizerProperties instance passed through the last call to setTokenizerProperties(de.susebox.jtopas.TokenizerProperties) or null if no such call has taken place so far.
    
    Specified by:
    
    getTokenizerProperties in interface DataMapper
    
    Returns:
    
    the backing TokenizerProperties or null
  - isWhitespace
```
public boolean isWhitespace(char testChar)
```
    This method checks if the character is a whitespace. Implement Your own code for situations where this default implementation is not fast enough or otherwise not really good.
    
    Specified by:
    
    isWhitespace in interface WhitespaceHandler
    
    Parameters:
    
    testChar - check this character
    
    Returns:
    
    true if the given character is a whitespace, false otherwise
    
    See Also:
    
    TokenizerProperties.setWhitespaces(java.lang.String)
  - countLeadingWhitespaces
```
public int countLeadingWhitespaces(DataProvider dataProvider)
                            throws java.lang.NullPointerException
```
    This method detects the number of whitespace characters the data range given through the DataProvider parameter starts with.
    
    Specified by:
    
    countLeadingWhitespaces in interface WhitespaceHandler
    
    Parameters:
    
    dataProvider - the source to get the data range from
    
    Returns:
    
    number of whitespace characters starting from the given offset
    
    Throws:
    
    TokenizerException - failure while reading data from the input stream
    
    java.lang.NullPointerException - if no DataProvider is given
    
    See Also:
    
    DataProvider
  - newlineIsWhitespace
```
public boolean newlineIsWhitespace()
```
    If a Tokenizer performs line counting, it is often nessecary to know if newline characters is considered to be a whitespace. See WhitespaceHandler for details.
    
    Specified by:
    
    newlineIsWhitespace in interface WhitespaceHandler
    
    Returns:
    
    true if newline characters are in the current whitespace set, false otherwise
  - isSeparator
```
public boolean isSeparator(char testChar)
```
    This method checks the given character if it is a separator.
    
    Specified by:
    
    isSeparator in interface SeparatorHandler
    
    Parameters:
    
    testChar - check this character
    
    Returns:
    
    true if the given character is a separator, false otherwise
    
    See Also:
    
    TokenizerProperties.setSeparators(java.lang.String)
  - hasSequenceCommentOrString
```
public boolean hasSequenceCommentOrString()
```
    This method can be used by a Tokenizer implementation for a fast detection if special sequence checking must be performed at all. If the method returns false time-consuming preparations can be skipped.
    
    Specified by:
    
    hasSequenceCommentOrString in interface SequenceHandler
    
    Returns:
    
    true if there actually are pattern that can be tested for a match, false otherwise.
  - startsWithSequenceCommentOrString
```
public TokenizerProperty startsWithSequenceCommentOrString(DataProvider dataProvider)
                                                    throws TokenizerException,
                                                           java.lang.NullPointerException
```
    This method checks if a given range of data starts with a special sequence, a comment or a string. These three types of token are testet together since both comment and string prefixes are ordinary special sequences. Only the actions preformed after a string or comment has been detected, are different.
    The method returns null if no special sequence, comment or string could matches the the leading part of the data range given through the DataProvider.
    In cases of strings or comments, the return value contains the description for the introducing character sequence, NOT the whole string or comment. The reading of the rest of the string or comment is done by the calling Tokenizer.
    
    Specified by:
    
    startsWithSequenceCommentOrString in interface SequenceHandler
    
    Parameters:
    
    dataProvider - the source to get the data range from
    
    Returns:
    
    a TokenizerProperty if a special sequence, comment or string could be detected, null otherwise
    
    Throws:
    
    TokenizerException - failure while reading more data
    
    java.lang.NullPointerException - if no DataProvider is given
  - getSequenceMaxLength
```
public int getSequenceMaxLength()
```
    This method returns the length of the longest special sequence, comment or string prefix that is known to this SequenceHandler. When calling startsWithSequenceCommentOrString(de.susebox.jtopas.spi.DataProvider), the passed DataProvider parameter will supply at least this number of characters (see DataProvider.getLength()). If less characters are provided, EOF is reached.
    
    Specified by:
    
    getSequenceMaxLength in interface SequenceHandler
    
    Returns:
    
    the number of characters needed in the worst case to identify a special sequence
  - hasKeywords
```
public boolean hasKeywords()
```
    This method can be used by a Tokenizer implementation for a fast detection if keyword matching must be performed at all. If the method returns false time-consuming preparations can be skipped.
    
    Specified by:
    
    hasKeywords in interface KeywordHandler
    
    Returns:
    
    true if there actually are pattern that can be tested for a match, false otherwise.
  - isKeyword
```
public TokenizerProperty isKeyword(DataProvider dataProvider)
                            throws TokenizerException,
                                   java.lang.NullPointerException
```
    This method checks if the character range given through the DataProvider comprises a keyword.
    
    Specified by:
    
    isKeyword in interface KeywordHandler
    
    Parameters:
    
    dataProvider - the source to get the data from, that are checked
    
    Returns:
    
    a TokenizerProperty if a keyword could be found, null otherwise
    
    Throws:
    
    TokenizerException - failure while reading more data
    
    java.lang.NullPointerException - if no DataProvider is given
  - hasPattern
```
public boolean hasPattern()
```
    This method can be used by a Tokenizer implementation for a fast detection if pattern matching must be performed at all. If the method returns false time-consuming preparations can be skipped.
    
    Specified by:
    
    hasPattern in interface PatternHandler
    
    Returns:
    
    true if there actually are pattern that can be tested for a match, false otherwise.
  - matches
```
public PatternHandler.Result matches(DataProvider dataProvider)
                              throws TokenizerException,
                                     java.lang.NullPointerException
```
    This method checks if the start of a character range given through the DataProvider matches a pattern.
    
    Specified by:
    
    matches in interface PatternHandler
    
    Parameters:
    
    dataProvider - the source to get the data from
    
    Returns:
    
    a PatternHandler.Result object or null if no match was found
    
    Throws:
    
    TokenizerException - generic exception
    
    java.lang.NullPointerException - if no DataProvider is given

Class StandardTokenizerProperties

Nested Class Summary

Nested classes/interfaces inherited from interface de.susebox.jtopas.spi.PatternHandler

Field Summary

Fields inherited from interface de.susebox.jtopas.TokenizerProperties

Constructor Summary

Method Summary

Methods inherited from class de.susebox.jtopas.AbstractTokenizerProperties

Methods inherited from class java.lang.Object

Methods inherited from interface de.susebox.jtopas.TokenizerProperties

Field Detail

MAX_NONFREE_MATCHLEN

CHARFLAG_WHITESPACE

CHARFLAG_SEPARATOR

Constructor Detail

StandardTokenizerProperties

StandardTokenizerProperties

StandardTokenizerProperties

Method Detail

getStrings

getWhitespaces

getSeparators

getLineComments

getBlockComments

getSpecialSequences

getKeywords

getPatterns

getProperties

setTokenizerProperties

getTokenizerProperties

isWhitespace

countLeadingWhitespaces

newlineIsWhitespace

isSeparator

hasSequenceCommentOrString

startsWithSequenceCommentOrString

getSequenceMaxLength

hasKeywords

isKeyword

hasPattern

matches