CLucene - a full-featured, c++ search engine
API Documentation
#include <AnalysisHeader.h>
Public Member Functions | |
Token () | |
~Token () | |
Token (const TCHAR *text, const int32_t start, const int32_t end, const TCHAR *typ=defaultType) | |
Constructs a Token with the given text, start and end offsets, & type. | |
void | set (const TCHAR *text, const int32_t start, const int32_t end, const TCHAR *typ=defaultType) |
size_t | bufferLength () |
void | growBuffer (size_t size) |
void | setPositionIncrement (int32_t posIncr) |
Set the position increment. | |
int32_t | getPositionIncrement () const |
const TCHAR * | termText () const |
size_t | termTextLength () |
void | resetTermTextLen () |
void | setText (const TCHAR *txt) |
int32_t | startOffset () const |
Returns this Token's starting offset, the position of the first character corresponding to this token in the source text. | |
void | setStartOffset (int32_t val) |
int32_t | endOffset () const |
Returns this Token's ending offset, one greater than the position of the last character corresponding to this token in the source text. | |
void | setEndOffset (int32_t val) |
const TCHAR * | type () const |
Returns this Token's lexical type. Defaults to "word". | |
void | setType (const TCHAR *val) |
returns reference | |
TCHAR * | toString () const |
Data Fields | |
TCHAR * | _termText |
the text of the term | |
int32_t | _termTextLen |
the length of termText. Internal use only | |
Static Public Attributes | |
static const TCHAR * | defaultType |
It consists of a term's text, the start and end offset of the term in the text of the field, and a type string.
The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc.
The type is an interned string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".
lucene::analysis::Token::Token | ( | ) |
lucene::analysis::Token::~Token | ( | ) |
lucene::analysis::Token::Token | ( | const TCHAR * | text, | |
const int32_t | start, | |||
const int32_t | end, | |||
const TCHAR * | typ = defaultType | |||
) |
Constructs a Token with the given text, start and end offsets, & type.
void lucene::analysis::Token::set | ( | const TCHAR * | text, | |
const int32_t | start, | |||
const int32_t | end, | |||
const TCHAR * | typ = defaultType | |||
) |
size_t lucene::analysis::Token::bufferLength | ( | ) | [inline] |
void lucene::analysis::Token::growBuffer | ( | size_t | size | ) |
void lucene::analysis::Token::setPositionIncrement | ( | int32_t | posIncr | ) |
Set the position increment.
This determines the position of this token relative to the previous Token in a TokenStream, used in phrase searching.
The default value is 1.
Some common uses for this are:
int32_t lucene::analysis::Token::getPositionIncrement | ( | ) | const |
const TCHAR* lucene::analysis::Token::termText | ( | ) | const |
size_t lucene::analysis::Token::termTextLength | ( | ) |
void lucene::analysis::Token::resetTermTextLen | ( | ) |
void lucene::analysis::Token::setText | ( | const TCHAR * | txt | ) |
int32_t lucene::analysis::Token::startOffset | ( | ) | const [inline] |
Returns this Token's starting offset, the position of the first character corresponding to this token in the source text.
Note that the difference between endOffset() and startOffset() may not be equal to termText.length(), as the term text may have been altered by a stemmer or some other filter.
void lucene::analysis::Token::setStartOffset | ( | int32_t | val | ) | [inline] |
int32_t lucene::analysis::Token::endOffset | ( | ) | const [inline] |
Returns this Token's ending offset, one greater than the position of the last character corresponding to this token in the source text.
void lucene::analysis::Token::setEndOffset | ( | int32_t | val | ) | [inline] |
const TCHAR* lucene::analysis::Token::type | ( | ) | const [inline] |
Returns this Token's lexical type. Defaults to "word".
returns reference
void lucene::analysis::Token::setType | ( | const TCHAR * | val | ) | [inline] |
returns reference
TCHAR* lucene::analysis::Token::toString | ( | ) | const |
the text of the term
the length of termText. Internal use only
const TCHAR* lucene::analysis::Token::defaultType [static] |