CLucene - a full-featured, c++ search engine
API Documentation
#include <Analyzers.h>
Public Member Functions | |
LetterTokenizer (lucene::util::Reader *in) | |
virtual | ~LetterTokenizer () |
Protected Member Functions | |
bool | isTokenChar (const TCHAR c) const |
Collects only characters which satisfy _istalpha. |
That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
lucene::analysis::LetterTokenizer::LetterTokenizer | ( | lucene::util::Reader * | in | ) |
virtual lucene::analysis::LetterTokenizer::~LetterTokenizer | ( | ) | [virtual] |
bool lucene::analysis::LetterTokenizer::isTokenChar | ( | const TCHAR | c | ) | const [protected, virtual] |