lucene::index::IndexWriter Class Reference

An IndexWriter creates and maintains an index. More...

#include <IndexWriter.h>

Public Member Functions

~IndexWriter ()

LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_MAX_FIELD_LENGTH=10000)

The Java implementation of Lucene silently truncates any tokenized field if the number of tokens exceeds a certain threshold.

LUCENE_STATIC_CONSTANT (int32_t, FIELD_TRUNC_POLICY__WARN=-1)

int32_t getMaxFieldLength () const

void setMaxFieldLength (int32_t val)

LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_MAX_BUFFERED_DOCS=10)

Default value is 10.

void setMaxBufferedDocs (int32_t val)

Determines the minimal number of documents required before the buffered in-memory documents are merging and a new Segment is created.

int32_t getMaxBufferedDocs ()

LUCENE_STATIC_CONSTANT (int64_t, WRITE_LOCK_TIMEOUT=1000)

Default value for the write lock timeout (1,000).

void setWriteLockTimeout (int64_t writeLockTimeout)

Sets the maximum time to wait for a write lock (in milliseconds).

int64_t getWriteLockTimeout ()

LUCENE_STATIC_CONSTANT (int64_t, COMMIT_LOCK_TIMEOUT=10000)

Default value for the commit lock timeout (10,000).

void setCommitLockTimeout (int64_t commitLockTimeout)

Sets the maximum time to wait for a commit lock (in milliseconds).

int64_t getCommitLockTimeout ()

LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_MERGE_FACTOR=10)

Default value is 10.

int32_t getMergeFactor () const

void setMergeFactor (int32_t val)

LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_TERM_INDEX_INTERVAL=128)

Expert: The fraction of terms in the "dictionary" which should be stored in RAM.

void setTermIndexInterval (int32_t interval)

Expert: Set the interval between indexed terms.

int32_t getTermIndexInterval ()

Expert: Return the interval between indexed terms.

int32_t getMinMergeDocs () const

Determines the minimal number of documents required before the buffered in-memory documents are merging and a new Segment is created.

void setMinMergeDocs (int32_t val)

LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_MAX_MERGE_DOCS=0x7FFFFFFFL)

Determines the largest number of documents ever merged by addDocument().

int32_t getMaxMergeDocs () const

Determines the largest number of documents ever merged by addDocument().

void setMaxMergeDocs (int32_t val)

IndexWriter (const char *path, lucene::analysis::Analyzer *a, const bool create, const bool closeDir=true)

Constructs an IndexWriter for the index in path.

IndexWriter (lucene::store::Directory *d, lucene::analysis::Analyzer *a, const bool create, const bool closeDir=false)

Constructs an IndexWriter for the index in d.

void close ()

Flushes all changes to an index, closes all associated files, and closes the directory that the index is stored in.

int32_t docCount ()

Returns the number of documents currently in this index.

void addDocument (lucene::document::Document *doc, lucene::analysis::Analyzer *analyzer=NULL)

Adds a document to this index, using the provided analyzer instead of the value of getAnalyzer().

void optimize ()

Merges all segments together into a single segment, optimizing an index for search.

void addIndexes (lucene::store::Directory **dirs)

Merges all segments from an array of indices into this index.

void addIndexes (IndexReader **readers)

Merges the provided indexes into this index.

lucene::store::Directory * getDirectory ()

Returns the directory this index resides in.

bool getUseCompoundFile ()

Get the current setting of whether to use the compound file format.

void setUseCompoundFile (bool value)

Setting to turn on usage of a compound file.

void setSimilarity (lucene::search::Similarity *similarity)

Expert: Set the Similarity implementation used by this IndexWriter.

lucene::search::Similarity * getSimilarity ()

Expert: Return the Similarity implementation used by this IndexWriter.

lucene::analysis::Analyzer * getAnalyzer ()

Returns the analyzer used by this index.

Data Fields

SegmentInfos * segmentInfos

Static Public Attributes

static const char * WRITE_LOCK_NAME

static const char * COMMIT_LOCK_NAME

Friends

class LockWith2

class LockWithCFS

Detailed Description

An IndexWriter creates and maintains an index.

The third argument to the constructor determines whether a new index is created, or whether an existing index is opened for the addition of new documents.

In either case, documents are added with the addDocument method. When finished adding documents, close should be called.

If an index will not have more documents added for a while and optimal search performance is desired, then the optimize method should be called before the index is closed.

Opening an IndexWriter creates a lock file for the directory in use. Trying to open another IndexWriter on the same directory will lead to an IOException. The IOException is also thrown if an IndexReader on the same directory is used to delete documents from the index.

See also:: IndexModifier IndexModifier supports the important methods of IndexWriter plus deletion

Constructor & Destructor Documentation

lucene::index::IndexWriter::~IndexWriter ( )

lucene::index::IndexWriter::IndexWriter	(	const char *	path,
		lucene::analysis::Analyzer *	a,
		const bool	create,
		const bool	closeDir = `true`
	)

Constructs an IndexWriter for the index in path.

Text will be analyzed with a. If create is true, then a new, empty index will be created in path, replacing the index already there, if any.

Parameters:

	path	the path to the index directory
	a	the analyzer to use
	create	`true` to create the index or overwrite the existing one; `false` to append to the existing index

Exceptions:

IOException if the directory cannot be read/written to, or if it does not exist, and create is false

lucene::index::IndexWriter::IndexWriter	(	lucene::store::Directory *	d,
		lucene::analysis::Analyzer *	a,
		const bool	create,
		const bool	closeDir = `false`
	)

Constructs an IndexWriter for the index in d.

Text will be analyzed with a. If create is true, then a new, empty index will be created in d, replacing the index already there, if any.

Member Function Documentation

lucene::index::IndexWriter::LUCENE_STATIC_CONSTANT	(	int32_t	,
		DEFAULT_MAX_FIELD_LENGTH	= `10000`
	)

The Java implementation of Lucene silently truncates any tokenized field if the number of tokens exceeds a certain threshold.

Although that threshold is adjustable, it is easy for the client programmer to be unaware that such a threshold exists, and to become its unwitting victim. CLucene implements a less insidious truncation policy. Up to DEFAULT_MAX_FIELD_LENGTH tokens, CLucene behaves just as JLucene does. If the number of tokens exceeds that threshold without any indication of a truncation preference by the client programmer, CLucene raises an exception, prompting the client programmer to explicitly set a truncation policy by adjusting maxFieldLength.

lucene::index::IndexWriter::LUCENE_STATIC_CONSTANT	(	int32_t	,
		FIELD_TRUNC_POLICY__WARN	= `-1`
	)

int32_t lucene::index::IndexWriter::getMaxFieldLength ( ) const [inline]

void lucene::index::IndexWriter::setMaxFieldLength ( int32_t val ) [inline]

lucene::index::IndexWriter::LUCENE_STATIC_CONSTANT	(	int32_t	,
		DEFAULT_MAX_BUFFERED_DOCS	= `10`
	)

Default value is 10.

Change using setMaxBufferedDocs(int).

void lucene::index::IndexWriter::setMaxBufferedDocs ( int32_t val ) [inline]

Determines the minimal number of documents required before the buffered in-memory documents are merging and a new Segment is created.

Since Documents are merged in a RAMDirectory, large value gives faster indexing. At the same time, mergeFactor limits the number of files open in a FSDirectory.

The default value is DEFAULT_MAX_BUFFERED_DOCS.

int32_t lucene::index::IndexWriter::getMaxBufferedDocs ( ) [inline]

See also:: setMaxBufferedDocs

lucene::index::IndexWriter::LUCENE_STATIC_CONSTANT	(	int64_t	,
		WRITE_LOCK_TIMEOUT	= `1000`
	)

Default value for the write lock timeout (1,000).

void lucene::index::IndexWriter::setWriteLockTimeout ( int64_t writeLockTimeout ) [inline]

Sets the maximum time to wait for a write lock (in milliseconds).

int64_t lucene::index::IndexWriter::getWriteLockTimeout ( ) [inline]

See also:: setWriteLockTimeout

lucene::index::IndexWriter::LUCENE_STATIC_CONSTANT	(	int64_t	,
		COMMIT_LOCK_TIMEOUT	= `10000`
	)

Default value for the commit lock timeout (10,000).

void lucene::index::IndexWriter::setCommitLockTimeout ( int64_t commitLockTimeout ) [inline]

Sets the maximum time to wait for a commit lock (in milliseconds).

int64_t lucene::index::IndexWriter::getCommitLockTimeout ( ) [inline]

See also:: setCommitLockTimeout

lucene::index::IndexWriter::LUCENE_STATIC_CONSTANT	(	int32_t	,
		DEFAULT_MERGE_FACTOR	= `10`
	)

Default value is 10.

Change using setMergeFactor(int).

int32_t lucene::index::IndexWriter::getMergeFactor ( ) const [inline]

void lucene::index::IndexWriter::setMergeFactor ( int32_t val ) [inline]

lucene::index::IndexWriter::LUCENE_STATIC_CONSTANT	(	int32_t	,
		DEFAULT_TERM_INDEX_INTERVAL	= `128`
	)

Expert: The fraction of terms in the "dictionary" which should be stored in RAM.

Smaller values use more memory, but make searching slightly faster, while larger values use less memory and make searching slightly slower. Searching is typically not dominated by dictionary lookup, so tweaking this is rarely useful.

void lucene::index::IndexWriter::setTermIndexInterval ( int32_t interval ) [inline]

Expert: Set the interval between indexed terms.

Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms.

This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.

In particular, numUniqueTerms/interval terms are read into memory by an IndexReader, and, on average, interval/2 terms must be scanned for each random term access.

See also:: DEFAULT_TERM_INDEX_INTERVAL

int32_t lucene::index::IndexWriter::getTermIndexInterval ( ) [inline]

Expert: Return the interval between indexed terms.

See also:: setTermIndexInterval(int)

int32_t lucene::index::IndexWriter::getMinMergeDocs ( ) const [inline]

Determines the minimal number of documents required before the buffered in-memory documents are merging and a new Segment is created.

Since Documents are merged in a RAMDirectory, large value gives faster indexing. At the same time, mergeFactor limits the number of files open in a FSDirectory.

The default value is 10.

void lucene::index::IndexWriter::setMinMergeDocs ( int32_t val ) [inline]

lucene::index::IndexWriter::LUCENE_STATIC_CONSTANT	(	int32_t	,
		DEFAULT_MAX_MERGE_DOCS	= `0x7FFFFFFFL`
	)

Determines the largest number of documents ever merged by addDocument().

Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

The default value is DEFAULT_MAX_MERGE_DOCS.

int32_t lucene::index::IndexWriter::getMaxMergeDocs ( ) const [inline]

Determines the largest number of documents ever merged by addDocument().

Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

The default value is Integer#MAX_VALUE.

void lucene::index::IndexWriter::setMaxMergeDocs ( int32_t val ) [inline]

void lucene::index::IndexWriter::close ( )

Flushes all changes to an index, closes all associated files, and closes the directory that the index is stored in.

int32_t lucene::index::IndexWriter::docCount ( )

Returns the number of documents currently in this index.

synchronized

void lucene::index::IndexWriter::addDocument	(	lucene::document::Document *	doc,
		lucene::analysis::Analyzer *	analyzer = `NULL`
	)

Adds a document to this index, using the provided analyzer instead of the value of getAnalyzer().

If the document contains more than setMaxFieldLength(int) terms for a given field, the remainder are discarded.

void lucene::index::IndexWriter::optimize ( )

Merges all segments together into a single segment, optimizing an index for search.

void lucene::index::IndexWriter::addIndexes ( lucene::store::Directory ** dirs )

Merges all segments from an array of indices into this index.

This may be used to parallelize batch indexing. A large document collection can be broken into sub-collections. Each sub-collection can be indexed in parallel, on a different thread, process or machine. The complete index can then be created by merging sub-collection indices with this method.

After this completes, the index is optimized.

void lucene::index::IndexWriter::addIndexes ( IndexReader ** readers )

Merges the provided indexes into this index.

After this completes, the index is optimized.

The provided IndexReaders are not closed.

lucene:: store ::Directory* lucene::index::IndexWriter::getDirectory ( ) [inline]

Returns the directory this index resides in.

bool lucene::index::IndexWriter::getUseCompoundFile ( ) [inline]

Get the current setting of whether to use the compound file format.

Note that this just returns the value you set with setUseCompoundFile(boolean) or the default. You cannot use this to query the status of an existing index.

See also:: setUseCompoundFile(boolean)

void lucene::index::IndexWriter::setUseCompoundFile ( bool value ) [inline]

Setting to turn on usage of a compound file.

When on, multiple files for each segment are merged into a single file once the segment creation is finished. This is done regardless of what directory is in use.

void lucene::index::IndexWriter::setSimilarity ( lucene::search::Similarity * similarity ) [inline]

Expert: Set the Similarity implementation used by this IndexWriter.

See also:: Similarity::setDefault(Similarity)

lucene:: search ::Similarity* lucene::index::IndexWriter::getSimilarity ( ) [inline]

Expert: Return the Similarity implementation used by this IndexWriter.

This defaults to the current value of Similarity#getDefault().

lucene:: analysis ::Analyzer* lucene::index::IndexWriter::getAnalyzer ( ) [inline]

Returns the analyzer used by this index.

Friends And Related Function Documentation

friend class LockWith2 [friend]

friend class LockWithCFS [friend]

Field Documentation

SegmentInfos* lucene::index::IndexWriter::segmentInfos

const char* lucene::index::IndexWriter::WRITE_LOCK_NAME [static]

const char* lucene::index::IndexWriter::COMMIT_LOCK_NAME [static]

The documentation for this class was generated from the following file:

IndexWriter.h


Public Member Functions
	~IndexWriter ()
	LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_MAX_FIELD_LENGTH=10000)
	The Java implementation of Lucene silently truncates any tokenized field if the number of tokens exceeds a certain threshold.
	LUCENE_STATIC_CONSTANT (int32_t, FIELD_TRUNC_POLICY__WARN=-1)
int32_t	getMaxFieldLength () const
void	setMaxFieldLength (int32_t val)
	LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_MAX_BUFFERED_DOCS=10)
	Default value is 10.
void	setMaxBufferedDocs (int32_t val)
	Determines the minimal number of documents required before the buffered in-memory documents are merging and a new Segment is created.
int32_t	getMaxBufferedDocs ()
	LUCENE_STATIC_CONSTANT (int64_t, WRITE_LOCK_TIMEOUT=1000)
	Default value for the write lock timeout (1,000).
void	setWriteLockTimeout (int64_t writeLockTimeout)
	Sets the maximum time to wait for a write lock (in milliseconds).
int64_t	getWriteLockTimeout ()
	LUCENE_STATIC_CONSTANT (int64_t, COMMIT_LOCK_TIMEOUT=10000)
	Default value for the commit lock timeout (10,000).
void	setCommitLockTimeout (int64_t commitLockTimeout)
	Sets the maximum time to wait for a commit lock (in milliseconds).
int64_t	getCommitLockTimeout ()
	LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_MERGE_FACTOR=10)
	Default value is 10.
int32_t	getMergeFactor () const
void	setMergeFactor (int32_t val)
	LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_TERM_INDEX_INTERVAL=128)
	Expert: The fraction of terms in the "dictionary" which should be stored in RAM.
void	setTermIndexInterval (int32_t interval)
	Expert: Set the interval between indexed terms.
int32_t	getTermIndexInterval ()
	Expert: Return the interval between indexed terms.
int32_t	getMinMergeDocs () const
	Determines the minimal number of documents required before the buffered in-memory documents are merging and a new Segment is created.
void	setMinMergeDocs (int32_t val)
	LUCENE_STATIC_CONSTANT (int32_t, DEFAULT_MAX_MERGE_DOCS=0x7FFFFFFFL)
	Determines the largest number of documents ever merged by addDocument().
int32_t	getMaxMergeDocs () const
	Determines the largest number of documents ever merged by addDocument().
void	setMaxMergeDocs (int32_t val)
	IndexWriter (const char path, lucene::analysis::Analyzer a, const bool create, const bool closeDir=true)
	Constructs an IndexWriter for the index in `path`.
	IndexWriter (lucene::store::Directory d, lucene::analysis::Analyzer a, const bool create, const bool closeDir=false)
	Constructs an IndexWriter for the index in `d`.
void	close ()
	Flushes all changes to an index, closes all associated files, and closes the directory that the index is stored in.
int32_t	docCount ()
	Returns the number of documents currently in this index.
void	addDocument (lucene::document::Document doc, lucene::analysis::Analyzer analyzer=NULL)
	Adds a document to this index, using the provided analyzer instead of the value of getAnalyzer().
void	optimize ()
	Merges all segments together into a single segment, optimizing an index for search.
void	addIndexes (lucene::store::Directory **dirs)
	Merges all segments from an array of indices into this index.
void	addIndexes (IndexReader **readers)
	Merges the provided indexes into this index.
lucene::store::Directory *	getDirectory ()
	Returns the directory this index resides in.
bool	getUseCompoundFile ()
	Get the current setting of whether to use the compound file format.
void	setUseCompoundFile (bool value)
	Setting to turn on usage of a compound file.
void	setSimilarity (lucene::search::Similarity *similarity)
	Expert: Set the Similarity implementation used by this IndexWriter.
lucene::search::Similarity *	getSimilarity ()
	Expert: Return the Similarity implementation used by this IndexWriter.
lucene::analysis::Analyzer *	getAnalyzer ()
	Returns the analyzer used by this index.
Data Fields
SegmentInfos *	segmentInfos
Static Public Attributes
static const char *	WRITE_LOCK_NAME
static const char *	COMMIT_LOCK_NAME
Friends
class	LockWith2
class	LockWithCFS