CLucene - a full-featured, c++ search engine
API Documentation
#include <Analyzers.h>

Public Member Functions | |
| LetterTokenizer (lucene::util::Reader *in) | |
| virtual | ~LetterTokenizer () |
Protected Member Functions | |
| bool | isTokenChar (const TCHAR c) const |
| Collects only characters which satisfy _istalpha. | |
That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
| lucene::analysis::LetterTokenizer::LetterTokenizer | ( | lucene::util::Reader * | in | ) |
| virtual lucene::analysis::LetterTokenizer::~LetterTokenizer | ( | ) | [virtual] |
| bool lucene::analysis::LetterTokenizer::isTokenChar | ( | const TCHAR | c | ) | const [protected, virtual] |