CLucene - a full-featured, c++ search engine
API Documentation
#include <Analyzers.h>

Public Member Functions | |
| LowerCaseTokenizer (lucene::util::Reader *in) | |
| Construct a new LowerCaseTokenizer. | |
| virtual | ~LowerCaseTokenizer () |
Protected Member Functions | |
| TCHAR | normalize (const TCHAR chr) const |
| Collects only characters which satisfy _totlower. | |
It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
| lucene::analysis::LowerCaseTokenizer::LowerCaseTokenizer | ( | lucene::util::Reader * | in | ) |
Construct a new LowerCaseTokenizer.
| virtual lucene::analysis::LowerCaseTokenizer::~LowerCaseTokenizer | ( | ) | [virtual] |
| TCHAR lucene::analysis::LowerCaseTokenizer::normalize | ( | const TCHAR | chr | ) | const [protected, virtual] |
Collects only characters which satisfy _totlower.
Reimplemented from lucene::analysis::CharTokenizer.