CLucene - a full-featured, c++ search engine
API Documentation


lucene::index::IndexModifier Class Reference

A class to modify an index, i.e. More...

#include <IndexModifier.h>


Public Member Functions

 IndexModifier (lucene::store::Directory *directory, lucene::analysis::Analyzer *analyzer, bool create)
 Open an index with write access.
 ~IndexModifier ()
 IndexModifier (const char *dirName, lucene::analysis::Analyzer *analyzer, bool create)
 Open an index with write access.
void flush ()
 Make sure all changes are written to disk.
void addDocument (lucene::document::Document *doc, lucene::analysis::Analyzer *docAnalyzer=NULL)
 Adds a document to this index, using the provided analyzer instead of the one specific in the constructor.
int32_t deleteDocuments (Term *term)
 Deletes all documents containing term.
void deleteDocument (int32_t docNum)
 Deletes the document numbered docNum.
int32_t docCount ()
 Returns the number of documents currently in this index.
void optimize ()
 Merges all segments together into a single segment, optimizing an index for search.
void setUseCompoundFile (bool useCompoundFile)
 Setting to turn on usage of a compound file.
bool getUseCompoundFile ()
void setMaxFieldLength (int32_t maxFieldLength)
 The maximum number of terms that will be indexed for a single field in a document.
int32_t getMaxFieldLength ()
void setMaxBufferedDocs (int32_t maxBufferedDocs)
 The maximum number of terms that will be indexed for a single field in a document.
int32_t getMaxBufferedDocs ()
void setMergeFactor (int32_t mergeFactor)
 Determines how often segment indices are merged by addDocument().
int32_t getMergeFactor ()
void close ()
 Close this index, writing all pending changes to disk.
TCHAR * toString () const
int64_t getCurrentVersion () const
 Gets the version number of the currently open index.
TermDocstermDocs (Term *term=NULL)
 Returns an enumeration of all the documents which contain term.
TermEnumterms (Term *term=NULL)
 Returns an enumeration of all terms after a given term.
bool document (const int32_t n, lucene::document::Document *doc)
 Returns the stored fields of the n-th Document in this index.
lucene::document::Documentdocument (const int32_t n)
lucene::store::DirectorygetDirectory ()
 Returns the directory used by this index.

Protected Member Functions

void init (lucene::store::Directory *directory, lucene::analysis::Analyzer *analyzer, bool create)
 Initialize an IndexWriter.
void assureOpen () const
 Throw an IllegalStateException if the index is closed.
void createIndexWriter ()
 Close the IndexReader and open an IndexWriter.
void createIndexReader ()
 Close the IndexWriter and open an IndexReader.

Protected Attributes

IndexWriterindexWriter
IndexReaderindexReader
lucene::store::Directorydirectory
lucene::analysis::Analyzeranalyzer
bool open
bool useCompoundFile
int32_t maxBufferedDocs
int32_t maxFieldLength
int32_t mergeFactor


Detailed Description

A class to modify an index, i.e.

to delete and add documents. This class hides IndexReader and IndexWriter so that you do not need to care about implementation details such as that adding documents is done via IndexWriter and deletion is done via IndexReader.

Note that you cannot create more than one IndexModifier object on the same directory at the same time.

Example usage:

//note this code will leak memory :) Analyzer* analyzer = new StandardAnalyzer();
// create an index in /tmp/index, overwriting an existing one:
IndexModifier* indexModifier = new IndexModifier("/tmp/index", analyzer, true);
Document* doc = new Document*();
doc->add(*new Field("id", "1", Field::STORE_YES| Field::INDEX_UNTOKENIZED));
doc->add(*new Field("body", "a simple test", Field::STORE_YES, Field::INDEX_TOKENIZED));
indexModifier->addDocument(doc);
int32_t deleted = indexModifier->deleteDocuments(new Term("id", "1"));
printf("Deleted %d document", deleted);
indexModifier->flush();
printf( "$d docs in index", indexModifier->docCount() );
indexModifier->close();

Not all methods of IndexReader and IndexWriter are offered by this class. If you need access to additional methods, either use those classes directly or implement your own class that extends IndexModifier.

Although an instance of this class can be used from more than one thread, you will not get the best performance. You might want to use IndexReader and IndexWriter directly for that (but you will need to care about synchronization yourself then).

While you can freely mix calls to add() and delete() using this class, you should batch you calls for best performance. For example, if you want to update 20 documents, you should first delete all those documents, then add all the new documents.


Constructor & Destructor Documentation

lucene::index::IndexModifier::IndexModifier ( lucene::store::Directory directory,
lucene::analysis::Analyzer analyzer,
bool  create 
)

Open an index with write access.

Parameters:
directory the index directory
analyzer the analyzer to use for adding new documents
create true to create the index or overwrite the existing one; false to append to the existing index

lucene::index::IndexModifier::~IndexModifier (  ) 

lucene::index::IndexModifier::IndexModifier ( const char *  dirName,
lucene::analysis::Analyzer analyzer,
bool  create 
)

Open an index with write access.

Parameters:
dirName the index directory
analyzer the analyzer to use for adding new documents
create true to create the index or overwrite the existing one; false to append to the existing index


Member Function Documentation

void lucene::index::IndexModifier::init ( lucene::store::Directory directory,
lucene::analysis::Analyzer analyzer,
bool  create 
) [protected]

Initialize an IndexWriter.

Exceptions:
IOException 

void lucene::index::IndexModifier::assureOpen (  )  const [protected]

Throw an IllegalStateException if the index is closed.

Exceptions:
IllegalStateException 

void lucene::index::IndexModifier::createIndexWriter (  )  [protected]

Close the IndexReader and open an IndexWriter.

Exceptions:
IOException 

void lucene::index::IndexModifier::createIndexReader (  )  [protected]

Close the IndexWriter and open an IndexReader.

Exceptions:
IOException 

void lucene::index::IndexModifier::flush (  ) 

Make sure all changes are written to disk.

Exceptions:
IOException 

void lucene::index::IndexModifier::addDocument ( lucene::document::Document doc,
lucene::analysis::Analyzer docAnalyzer = NULL 
)

Adds a document to this index, using the provided analyzer instead of the one specific in the constructor.

If the document contains more than setMaxFieldLength(int32_t) terms for a given field, the remainder are discarded.

See also:
IndexWriter::addDocument(Document*, Analyzer*)
Exceptions:
IllegalStateException if the index is closed

int32_t lucene::index::IndexModifier::deleteDocuments ( Term term  ) 

Deletes all documents containing term.

This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. Returns the number of documents deleted.

Returns:
the number of documents deleted
See also:
IndexReader::deleteDocuments(Term*)
Exceptions:
IllegalStateException if the index is closed

void lucene::index::IndexModifier::deleteDocument ( int32_t  docNum  ) 

Deletes the document numbered docNum.

See also:
IndexReader::deleteDocument(int32_t)
Exceptions:
IllegalStateException if the index is closed

int32_t lucene::index::IndexModifier::docCount (  ) 

Returns the number of documents currently in this index.

See also:
IndexWriter::docCount()

IndexReader::numDocs()

Exceptions:
IllegalStateException if the index is closed

void lucene::index::IndexModifier::optimize (  ) 

Merges all segments together into a single segment, optimizing an index for search.

See also:
IndexWriter::optimize()
Exceptions:
IllegalStateException if the index is closed

void lucene::index::IndexModifier::setUseCompoundFile ( bool  useCompoundFile  ) 

Setting to turn on usage of a compound file.

When on, multiple files for each segment are merged into a single file once the segment creation is finished. This is done regardless of what directory is in use.

See also:
IndexWriter::setUseCompoundFile(bool)
Exceptions:
IllegalStateException if the index is closed

bool lucene::index::IndexModifier::getUseCompoundFile (  ) 

Exceptions:
IOException 
See also:
IndexModifier::setUseCompoundFile(bool)

void lucene::index::IndexModifier::setMaxFieldLength ( int32_t  maxFieldLength  ) 

The maximum number of terms that will be indexed for a single field in a document.

This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory.

Note that this effectively truncates large documents, excluding from the index terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError.

By default, no more than 10,000 terms will be indexed for a field.

See also:
IndexWriter::setMaxFieldLength(int32_t)
Exceptions:
IllegalStateException if the index is closed

int32_t lucene::index::IndexModifier::getMaxFieldLength (  ) 

Exceptions:
IOException 
See also:
IndexModifier::setMaxFieldLength(int32_t)

void lucene::index::IndexModifier::setMaxBufferedDocs ( int32_t  maxBufferedDocs  ) 

The maximum number of terms that will be indexed for a single field in a document.

This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory.

Note that this effectively truncates large documents, excluding from the index terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError.

By default, no more than 10,000 terms will be indexed for a field.

See also:
IndexWriter::setMaxBufferedDocs(int32_t)
Exceptions:
IllegalStateException if the index is closed

int32_t lucene::index::IndexModifier::getMaxBufferedDocs (  ) 

Exceptions:
IOException 
See also:
IndexModifier::setMaxBufferedDocs(int32_t)

void lucene::index::IndexModifier::setMergeFactor ( int32_t  mergeFactor  ) 

Determines how often segment indices are merged by addDocument().

With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

This must never be less than 2. The default value is 10.

See also:
IndexWriter::setMergeFactor(int32_t)
Exceptions:
IllegalStateException if the index is closed

int32_t lucene::index::IndexModifier::getMergeFactor (  ) 

Exceptions:
IOException 
See also:
IndexModifier::setMergeFactor(int32_t)

void lucene::index::IndexModifier::close (  ) 

Close this index, writing all pending changes to disk.

Exceptions:
IllegalStateException if the index has been closed before already

TCHAR* lucene::index::IndexModifier::toString (  )  const

int64_t lucene::index::IndexModifier::getCurrentVersion (  )  const

Gets the version number of the currently open index.

TermDocs* lucene::index::IndexModifier::termDocs ( Term term = NULL  ) 

Returns an enumeration of all the documents which contain term.

Warning: This is not threadsafe. Make sure you lock the modifier object while using the TermDocs. If the IndexReader that the modifier manages is closed, the TermDocs object will fail.

TermEnum* lucene::index::IndexModifier::terms ( Term term = NULL  ) 

Returns an enumeration of all terms after a given term.

If no term is given, an enumeration of all the terms in the index is returned. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

Warning: This is not threadsafe. Make sure you lock the modifier object while using the TermDocs. If the IndexReader that the modifier manages is closed, the Document will be invalid

bool lucene::index::IndexModifier::document ( const int32_t  n,
lucene::document::Document doc 
)

Returns the stored fields of the n-th Document in this index.

Warning: This is not threadsafe. Make sure you lock the modifier object while using the TermDocs. If the IndexReader that the modifier manages is closed, the Document will be invalid

lucene:: document ::Document* lucene::index::IndexModifier::document ( const int32_t  n  ) 

lucene:: store ::Directory* lucene::index::IndexModifier::getDirectory (  ) 

Returns the directory used by this index.


Field Documentation

lucene:: store ::Directory* lucene::index::IndexModifier::directory [protected]

lucene:: analysis ::Analyzer* lucene::index::IndexModifier::analyzer [protected]


The documentation for this class was generated from the following file:

clucene.sourceforge.net