CLucene - a full-featured, c++ search engine
API Documentation
#include <Analyzers.h>
Public Member Functions | |
LowerCaseTokenizer (lucene::util::Reader *in) | |
Construct a new LowerCaseTokenizer. | |
virtual | ~LowerCaseTokenizer () |
Protected Member Functions | |
TCHAR | normalize (const TCHAR chr) const |
Collects only characters which satisfy _totlower. |
It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
lucene::analysis::LowerCaseTokenizer::LowerCaseTokenizer | ( | lucene::util::Reader * | in | ) |
Construct a new LowerCaseTokenizer.
virtual lucene::analysis::LowerCaseTokenizer::~LowerCaseTokenizer | ( | ) | [virtual] |
TCHAR lucene::analysis::LowerCaseTokenizer::normalize | ( | const TCHAR | chr | ) | const [protected, virtual] |
Collects only characters which satisfy _totlower.
Reimplemented from lucene::analysis::CharTokenizer.