CLucene - a full-featured, c++ search engine
API Documentation


lucene::analysis::standard::StandardTokenizer Class Reference

A grammar-based tokenizer constructed with JavaCC. More...

#include <StandardTokenizer.h>

Inheritance diagram for lucene::analysis::standard::StandardTokenizer:

lucene::analysis::Tokenizer lucene::analysis::TokenStream

Public Member Functions

 StandardTokenizer (lucene::util::Reader *reader)
 ~StandardTokenizer ()
bool next (Token *token)
 Returns the next token in the stream, or false at end-of-stream.
bool ReadNumber (const TCHAR *previousNumber, const TCHAR prev, Token *t)
bool ReadAlphaNum (const TCHAR prev, Token *t)
bool ReadApostrophe (lucene::util::StringBuffer *str, Token *t)
bool ReadAt (lucene::util::StringBuffer *str, Token *t)
bool ReadCompany (lucene::util::StringBuffer *str, Token *t)
bool ReadCJK (const TCHAR prev, Token *t)

Data Fields

lucene::util::FastCharStream * rd

Detailed Description

A grammar-based tokenizer constructed with JavaCC.

This should be a good tokenizer for most European-language documents:

Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.


Constructor & Destructor Documentation

lucene::analysis::standard::StandardTokenizer::StandardTokenizer ( lucene::util::Reader reader  ) 

lucene::analysis::standard::StandardTokenizer::~StandardTokenizer (  ) 


Member Function Documentation

bool lucene::analysis::standard::StandardTokenizer::next ( Token token  )  [virtual]

Returns the next token in the stream, or false at end-of-stream.

The returned token's type is set to an element of StandardTokenizerConstants::tokenImage.

Implements lucene::analysis::TokenStream.

bool lucene::analysis::standard::StandardTokenizer::ReadNumber ( const TCHAR *  previousNumber,
const TCHAR  prev,
Token t 
)

bool lucene::analysis::standard::StandardTokenizer::ReadAlphaNum ( const TCHAR  prev,
Token t 
)

bool lucene::analysis::standard::StandardTokenizer::ReadApostrophe ( lucene::util::StringBuffer *  str,
Token t 
)

bool lucene::analysis::standard::StandardTokenizer::ReadAt ( lucene::util::StringBuffer *  str,
Token t 
)

bool lucene::analysis::standard::StandardTokenizer::ReadCompany ( lucene::util::StringBuffer *  str,
Token t 
)

bool lucene::analysis::standard::StandardTokenizer::ReadCJK ( const TCHAR  prev,
Token t 
)


Field Documentation

lucene:: util ::FastCharStream* lucene::analysis::standard::StandardTokenizer::rd


The documentation for this class was generated from the following file:

clucene.sourceforge.net