CLucene - a full-featured, c++ search engine
API Documentation
#include <Analyzers.h>
Public Member Functions | |
CharTokenizer (lucene::util::Reader *in) | |
virtual | ~CharTokenizer () |
bool | next (Token *token) |
Sets token to the next token in the stream, returns false at the EOS. | |
Protected Member Functions | |
virtual bool | isTokenChar (const TCHAR c) const =0 |
Returns true iff a character should be included in a token. | |
virtual TCHAR | normalize (const TCHAR c) const |
Called on each token character to normalize it before it is added to the token. |
lucene::analysis::CharTokenizer::CharTokenizer | ( | lucene::util::Reader * | in | ) |
virtual lucene::analysis::CharTokenizer::~CharTokenizer | ( | ) | [virtual] |
virtual bool lucene::analysis::CharTokenizer::isTokenChar | ( | const TCHAR | c | ) | const [protected, pure virtual] |
Returns true iff a character should be included in a token.
This tokenizer generates as tokens adjacent sequences of characters which satisfy this predicate. Characters for which this is false are used to define token boundaries and are not included in tokens.
Implemented in lucene::analysis::LetterTokenizer, and lucene::analysis::WhitespaceTokenizer.
virtual TCHAR lucene::analysis::CharTokenizer::normalize | ( | const TCHAR | c | ) | const [protected, virtual] |
Called on each token character to normalize it before it is added to the token.
The default implementation does nothing. Subclasses may use this to, e.g., lowercase tokens.
Reimplemented in lucene::analysis::LowerCaseTokenizer.
bool lucene::analysis::CharTokenizer::next | ( | Token * | token | ) | [virtual] |
Sets token to the next token in the stream, returns false at the EOS.
Implements lucene::analysis::TokenStream.