Lucene.Net for Search

By Amina Zeenath on August 14, 2014

Lucene.Net

Now a days we have many customers who will be asking a search mechanism. Sometimes it is not enough to have just filters on lists but need to perform large scale searching with complex queries. In order to achieve this we may need to write some complex SQL queries which may hit on performance and quality of the product by killing the server. The Lucene can resolve this by helping you index documents and search those indexed documents.

Lucene.Net is a high-performance, full featured text search engine. Lucene.Net contains powerful APIs for cre­at­ing full text indexes and implementing advanced and precise search technologies into your programs. Lucene.Net is a port of the Lucene search engine library, written in C# and targeted at .NET runtime users. The Lucene search library is based on an inverted index. That is to allow fast full text searches, at a cost of increased processing when a document is added to the database.

lucene

There are four sim­ple steps to cre­ate and search an index using Lucene.

 

• Create an index

• Build the query

• Perform the search

• Display the results

 

Indexing

To index the content need to acquire the content first and build a doc­u­ment based on some pre­defined fields. The libraries needed to cre­ate an index are the Directory, Analyzer, IndexWriter, Document and Field. The directory Path vari­able iden­ti­fies which directory you want to index. The analyzer is used to remove ‘noise words’ like and, the, of, but, etc… You can pass in a lan­guage specific analyzer if needed. Default is English. The IndexWriter is the class that will write your index. The ‘true’ parameter here is say­ing that I want a new index file created instead of updating the existing one. The writer writes the document to the index file which will later be searched. The index consists of a group of documents, which con­tain fields which contain terms as you see in the below image.

lucene2

 

After build­ing the document, need to analyze the document to avoid some noise files. Some words like to, an, the ‚are not important and frequently appear in the con­tent. But they have no meaning, so they will not be searched frequently. To save the disk space and get more speed, we should ignore those words. After these hard processing, the document is added to the index. Lucene covers this part, so we have noth­ing to worry about this. After index­ing, the doc­u­ment is ready to be searched.

Searching

If a user enters a query, then the query is also ana­lyzed and parsed into query classes. Lucene.net’s QueryPaser class does the job. After build­ing a query, we should find the doc­u­ment that matches the query. Lucene does this and there are many exten­sion points to meet your needs. After run­ning query is done, the results are returned for the user.

Conclusion

Lucene.NET is good solution for applications that need wide and powerful search capabilities. Lucene.NET is small library by size and it is very easy to use. Lucene.NET API enables you to fully manage the search index and per­form queries on it.

 

Reference:

1. http://lucenenet.apache.org/

2. http://www.codeproject.com/Articles/29755/Introducing-Lucene-Net

 

Leave a Reply

SCROLL TO TOP