CLucene - a full-featured, c++ search engine
API Documentation

lucene::analysis::LetterTokenizer Class Reference

A LetterTokenizer is a tokenizer that divides text at non-letters. More...

#include <Analyzers.h>

Inheritance diagram for lucene::analysis::LetterTokenizer:

lucene::analysis::CharTokenizer lucene::analysis::Tokenizer lucene::analysis::TokenStream lucene::analysis::LowerCaseTokenizer

Public Member Functions

 LetterTokenizer (lucene::util::Reader *in)
virtual ~LetterTokenizer ()

Protected Member Functions

bool isTokenChar (const TCHAR c) const
 Collects only characters which satisfy _istalpha.

Detailed Description

A LetterTokenizer is a tokenizer that divides text at non-letters.

That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Constructor & Destructor Documentation

lucene::analysis::LetterTokenizer::LetterTokenizer ( lucene::util::Reader in  ) 

virtual lucene::analysis::LetterTokenizer::~LetterTokenizer (  )  [virtual]

Member Function Documentation

bool lucene::analysis::LetterTokenizer::isTokenChar ( const TCHAR  c  )  const [protected, virtual]

Collects only characters which satisfy _istalpha.

Implements lucene::analysis::CharTokenizer.

The documentation for this class was generated from the following file: