CLucene - a full-featured, c++ search engine
API Documentation
#include <StandardTokenizer.h>
Public Member Functions | |
StandardTokenizer (lucene::util::Reader *reader) | |
~StandardTokenizer () | |
bool | next (Token *token) |
Returns the next token in the stream, or false at end-of-stream. | |
bool | ReadNumber (const TCHAR *previousNumber, const TCHAR prev, Token *t) |
bool | ReadAlphaNum (const TCHAR prev, Token *t) |
bool | ReadApostrophe (lucene::util::StringBuffer *str, Token *t) |
bool | ReadAt (lucene::util::StringBuffer *str, Token *t) |
bool | ReadCompany (lucene::util::StringBuffer *str, Token *t) |
bool | ReadCJK (const TCHAR prev, Token *t) |
Data Fields | |
lucene::util::FastCharStream * | rd |
This should be a good tokenizer for most European-language documents:
Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.
lucene::analysis::standard::StandardTokenizer::StandardTokenizer | ( | lucene::util::Reader * | reader | ) |
lucene::analysis::standard::StandardTokenizer::~StandardTokenizer | ( | ) |
bool lucene::analysis::standard::StandardTokenizer::next | ( | Token * | token | ) | [virtual] |
Returns the next token in the stream, or false at end-of-stream.
The returned token's type is set to an element of StandardTokenizerConstants::tokenImage.
Implements lucene::analysis::TokenStream.
bool lucene::analysis::standard::StandardTokenizer::ReadNumber | ( | const TCHAR * | previousNumber, | |
const TCHAR | prev, | |||
Token * | t | |||
) |
bool lucene::analysis::standard::StandardTokenizer::ReadAlphaNum | ( | const TCHAR | prev, | |
Token * | t | |||
) |
bool lucene::analysis::standard::StandardTokenizer::ReadApostrophe | ( | lucene::util::StringBuffer * | str, | |
Token * | t | |||
) |
bool lucene::analysis::standard::StandardTokenizer::ReadAt | ( | lucene::util::StringBuffer * | str, | |
Token * | t | |||
) |
bool lucene::analysis::standard::StandardTokenizer::ReadCompany | ( | lucene::util::StringBuffer * | str, | |
Token * | t | |||
) |
bool lucene::analysis::standard::StandardTokenizer::ReadCJK | ( | const TCHAR | prev, | |
Token * | t | |||
) |
lucene:: util ::FastCharStream* lucene::analysis::standard::StandardTokenizer::rd |