Lucene – Indexing Operations

Lucene – Indexing Operations ”; Previous Next In this chapter, we”ll discuss the four major operations of indexing. These operations are useful at various times and are used throughout of a software search application. Indexing Operations Following is a list of commonly-used operations during indexing process. S.No. Operation & Description 1 Add Document This operation is used in the initial stage of the indexing process to create the indexes on the newly available content. 2 Update Document This operation is used to update indexes to reflect the changes in the updated contents. It is similar to recreating the index. 3 Delete Document This operation is used to update indexes to exclude the documents which are not required to be indexed/searched. 4 Field Options Field options specify a way or control the ways in which the contents of a field are to be made searchable. Print Page Previous Next Advertisements ”;

Lucene – Searching Classes

Lucene – Searching Classes ”; Previous Next The process of Searching is again one of the core functionalities provided by Lucene. Its flow is similar to that of the indexing process. Basic search of Lucene can be made using the following classes which can also be termed as foundation classes for all search related operations. Searching Classes Following is a list of commonly-used classes during searching process. S.No. Class & Description 1 IndexSearcher This class act as a core component which reads/searches indexes created after the indexing process. It takes directory instance pointing to the location containing the indexes. 2 Term This class is the lowest unit of searching. It is similar to Field in indexing process. 3 Query Query is an abstract class and contains various utility methods and is the parent of all types of queries that Lucene uses during search process. 4 TermQuery TermQuery is the most commonly-used query object and is the foundation of many complex queries that Lucene can make use of. 5 TopDocs TopDocs points to the top N search results which matches the search criteria. It is a simple container of pointers to point to documents which are the output of a search result. Print Page Previous Next Advertisements ”;

Lucene – Sorting

Lucene – Sorting ”; Previous Next In this chapter, we will look into the sorting orders in which Lucene gives the search results by default or can be manipulated as required. Sorting by Relevance This is the default sorting mode used by Lucene. Lucene provides results by the most relevant hit at the top. private void sortUsingRelevance(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMillis(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.RELEVANCE); long endTime = System.currentTimeMillis(); System.out.println(hits.totalHits + ” documents found. Time :” + (endTime – startTime) + “ms”); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print(“Score: “+ scoreDoc.score + ” “); System.out.println(“File: “+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } Sorting by IndexOrder This sorting mode is used by Lucene. Here, the first document indexed is shown first in the search results. private void sortUsingIndex(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMillis(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.INDEXORDER); long endTime = System.currentTimeMillis(); System.out.println(hits.totalHits + ” documents found. Time :” + (endTime – startTime) + “ms”); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print(“Score: “+ scoreDoc.score + ” “); System.out.println(“File: “+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } Example Application Let us create a test Lucene application to test the sorting process. Step Description 1 Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene – First Application chapter. You can also use the project created in Lucene – First Application chapter as such for this chapter to understand the searching process. 2 Create LuceneConstants.java and Searcher.java as explained in the Lucene – First Application chapter. Keep the rest of the files unchanged. 3 Create LuceneTester.java as mentioned below. 4 Clean and Build the application to make sure the business logic is working as per the requirements. LuceneConstants.java This class is used to provide various constants to be used across the sample application. package com.tutorialspoint.lucene; public class LuceneConstants { public static final String CONTENTS = “contents”; public static final String FILE_NAME = “filename”; public static final String FILE_PATH = “filepath”; public static final int MAX_SEARCH = 10; } Searcher.java This class is used to read the indexes made on raw data and searches data using the Lucene library. package com.tutorialspoint.lucene; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class Searcher { IndexSearcher indexSearcher; QueryParser queryParser; Query query; public Searcher(String indexDirectoryPath) throws IOException { Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); indexSearcher = new IndexSearcher(indexDirectory); queryParser = new QueryParser(Version.LUCENE_36, LuceneConstants.CONTENTS, new StandardAnalyzer(Version.LUCENE_36)); } public TopDocs search( String searchQuery) throws IOException, ParseException { query = queryParser.parse(searchQuery); return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); } public TopDocs search(Query query) throws IOException, ParseException { return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); } public TopDocs search(Query query,Sort sort) throws IOException, ParseException { return indexSearcher.search(query, LuceneConstants.MAX_SEARCH,sort); } public void setDefaultFieldSortScoring(boolean doTrackScores, boolean doMaxScores) { indexSearcher.setDefaultFieldSortScoring( doTrackScores,doMaxScores); } public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException { return indexSearcher.doc(scoreDoc.doc); } public void close() throws IOException { indexSearcher.close(); } } LuceneTester.java This class is used to test the searching capability of the Lucene library. package com.tutorialspoint.lucene; import java.io.IOException; import org.apache.lucene.document.Document; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.search.FuzzyQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.TopDocs; public class LuceneTester { String indexDir = “E:\Lucene\Index”; String dataDir = “E:\Lucene\Data”; Indexer indexer; Searcher searcher; public static void main(String[] args) { LuceneTester tester; try { tester = new LuceneTester(); tester.sortUsingRelevance(“cord3.txt”); tester.sortUsingIndex(“cord3.txt”); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } private void sortUsingRelevance(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMillis(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.RELEVANCE); long endTime = System.currentTimeMillis(); System.out.println(hits.totalHits + ” documents found. Time :” + (endTime – startTime) + “ms”); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print(“Score: “+ scoreDoc.score + ” “); System.out.println(“File: “+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } private void sortUsingIndex(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMillis(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.INDEXORDER); long endTime = System.currentTimeMillis(); System.out.println(hits.totalHits + ” documents found. Time :” + (endTime – startTime) + “ms”); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print(“Score: “+ scoreDoc.score + ” “); System.out.println(“File: “+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } } Data & Index Directory Creation We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory E:LuceneData. Test Data. An index directory path should be created as E:LuceneIndex. After running the indexing program in the chapter Lucene – Indexing Process, you can see the list of index files created in that folder. Running the Program Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can compile and run your program. To do this, Keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE”s console − 10 documents found. Time :31ms Score: 1.3179655 File: E:LuceneDatarecord3.txt Score: 0.790779 File: E:LuceneDatarecord1.txt Score: 0.790779 File: E:LuceneDatarecord2.txt Score: 0.790779 File: E:LuceneDatarecord4.txt Score: 0.790779 File: E:LuceneDatarecord5.txt Score: 0.790779 File: E:LuceneDatarecord6.txt Score: 0.790779 File: E:LuceneDatarecord7.txt Score: 0.790779 File: E:LuceneDatarecord8.txt Score:

Lucene – Indexing Classes

Lucene – Indexing Classes ”; Previous Next Indexing process is one of the core functionalities provided by Lucene. The following diagram illustrates the indexing process and the use of classes. IndexWriter is the most important and the core component of the indexing process. We add Document(s) containing Field(s) to IndexWriter which analyzes the Document(s) using the Analyzer and then creates/open/edit indexes as required and store/update them in a Directory. IndexWriter is used to update or create indexes. It is not used to read indexes. Indexing Classes Following is a list of commonly-used classes during the indexing process. S.No. Class & Description 1 IndexWriter This class acts as a core component which creates/updates indexes during the indexing process. 2 Directory This class represents the storage location of the indexes. 3 Analyzer This class is responsible to analyze a document and get the tokens/words from the text which is to be indexed. Without analysis done, IndexWriter cannot create index. 4 Document This class represents a virtual document with Fields where the Field is an object which can contain the physical document”s contents, its meta data and so on. The Analyzer can understand a Document only. 5 Field This is the lowest unit or the starting point of the indexing process. It represents the key value pair relationship where a key is used to identify the value to be indexed. Let us assume a field used to represent contents of a document will have key as “contents” and the value may contain the part or all of the text or numeric content of the document. Lucene can index only text or numeric content only. Print Page Previous Next Advertisements ”;

Lucene – Indexing Process

Lucene – Indexing Process ”; Previous Next Indexing process is one of the core functionality provided by Lucene. Following diagram illustrates the indexing process and use of classes. IndexWriter is the most important and core component of the indexing process. We add Document(s) containing Field(s) to IndexWriter which analyzes the Document(s) using the Analyzer and then creates/open/edit indexes as required and store/update them in a Directory. IndexWriter is used to update or create indexes. It is not used to read indexes. Now we”ll show you a step by step process to get a kick start in understanding of indexing process using a basic example. Create a document Create a method to get a lucene document from a text file. Create various types of fields which are key value pairs containing keys as names and values as contents to be indexed. Set field to be analyzed or not. In our case, only contents is to be analyzed as it can contain data such as a, am, are, an etc. which are not required in search operations. Add the newly created fields to the document object and return it to the caller method. private Document getDocument(File file) throws IOException { Document document = new Document(); //index file contents Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file)); //index file name Field fileNameField = new Field(LuceneConstants.FILE_NAME, file.getName(), Field.Store.YES,Field.Index.NOT_ANALYZED); //index file path Field filePathField = new Field(LuceneConstants.FILE_PATH, file.getCanonicalPath(), Field.Store.YES,Field.Index.NOT_ANALYZED); document.add(contentField); document.add(fileNameField); document.add(filePathField); return document; } Create a IndexWriter IndexWriter class acts as a core component which creates/updates indexes during indexing process. Follow these steps to create a IndexWriter − Step 1 − Create object of IndexWriter. Step 2 − Create a Lucene directory which should point to location where indexes are to be stored. Step 3 − Initialize the IndexWriter object created with the index directory, a standard analyzer having version information and other required/optional parameters. private IndexWriter writer; public Indexer(String indexDirectoryPath) throws IOException { //this directory will contain the indexes Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); //create the indexer writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36),true, IndexWriter.MaxFieldLength.UNLIMITED); } Start Indexing Process The following program shows how to start an indexing process − private void indexFile(File file) throws IOException { System.out.println(“Indexing “+file.getCanonicalPath()); Document document = getDocument(file); writer.addDocument(document); } Example Application To test the indexing process, we need to create a Lucene application test. Step Description 1 Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene – First Application chapter. You can also use the project created in Lucene – First Application chapter as such for this chapter to understand the indexing process. 2 Create LuceneConstants.java,TextFileFilter.java and Indexer.java as explained in the Lucene – First Application chapter. Keep the rest of the files unchanged. 3 Create LuceneTester.java as mentioned below. 4 Clean and build the application to make sure the business logic is working as per the requirements. LuceneConstants.java This class is used to provide various constants to be used across the sample application. package com.tutorialspoint.lucene; public class LuceneConstants { public static final String CONTENTS = “contents”; public static final String FILE_NAME = “filename”; public static final String FILE_PATH = “filepath”; public static final int MAX_SEARCH = 10; } TextFileFilter.java This class is used as a .txt file filter. package com.tutorialspoint.lucene; import java.io.File; import java.io.FileFilter; public class TextFileFilter implements FileFilter { @Override public boolean accept(File pathname) { return pathname.getName().toLowerCase().endsWith(“.txt”); } } Indexer.java This class is used to index the raw data so that we can make it searchable using the Lucene library. package com.tutorialspoint.lucene; import java.io.File; import java.io.FileFilter; import java.io.FileReader; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class Indexer { private IndexWriter writer; public Indexer(String indexDirectoryPath) throws IOException { //this directory will contain the indexes Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); //create the indexer writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36),true, IndexWriter.MaxFieldLength.UNLIMITED); } public void close() throws CorruptIndexException, IOException { writer.close(); } private Document getDocument(File file) throws IOException { Document document = new Document(); //index file contents Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file)); //index file name Field fileNameField = new Field(LuceneConstants.FILE_NAME, file.getName(), Field.Store.YES,Field.Index.NOT_ANALYZED); //index file path Field filePathField = new Field(LuceneConstants.FILE_PATH, file.getCanonicalPath(), Field.Store.YES,Field.Index.NOT_ANALYZED); document.add(contentField); document.add(fileNameField); document.add(filePathField); return document; } private void indexFile(File file) throws IOException { System.out.println(“Indexing “+file.getCanonicalPath()); Document document = getDocument(file); writer.addDocument(document); } public int createIndex(String dataDirPath, FileFilter filter) throws IOException { //get all files in the data directory File[] files = new File(dataDirPath).listFiles(); for (File file : files) { if(!file.isDirectory() && !file.isHidden() && file.exists() && file.canRead() && filter.accept(file) ){ indexFile(file); } } return writer.numDocs(); } } LuceneTester.java This class is used to test the indexing capability of the Lucene library. package com.tutorialspoint.lucene; import java.io.IOException; public class LuceneTester { String indexDir = “E:\Lucene\Index”; String dataDir = “E:\Lucene\Data”; Indexer indexer; public static void main(String[] args) { LuceneTester tester; try { tester = new LuceneTester(); tester.createIndex(); } catch (IOException e) { e.printStackTrace(); } } private void createIndex() throws IOException { indexer = new Indexer(indexDir); int numIndexed; long startTime = System.currentTimeMillis(); numIndexed = indexer.createIndex(dataDir, new TextFileFilter()); long endTime = System.currentTimeMillis(); indexer.close(); System.out.println(numIndexed+” File indexed, time taken: ” +(endTime-startTime)+” ms”); } } Data & Index Directory Creation We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory E:LuceneData. Test Data. An index directory path should be created as E:LuceneIndex. After running this program, you can see the list of index files created in that folder. Running the Program Once you are done with the creation of the source, the raw data, the data directory and the index directory, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE”s console − Indexing E:LuceneDatarecord1.txt Indexing E:LuceneDatarecord10.txt Indexing E:LuceneDatarecord2.txt Indexing E:LuceneDatarecord3.txt Indexing E:LuceneDatarecord4.txt Indexing

Lucene – Home

Lucene Tutorial PDF Version Quick Guide Resources Job Search Discussion Lucene is an open source Java based search library. It is very popular and a fast search library. It is used in Java based applications to add document search capability to any kind of application in a very simple and efficient way. This tutorial will give you a great understanding on Lucene concepts and help you understand the complexity of search requirements in enterprise level applications and need of Lucene search engine. Audience This tutorial is designed for Software Professionals who are willing to learn Lucene search Engine Programming in simple and easy steps. After completing this tutorial, you will be at the intermediate level of expertise from where you can take yourself to a higher level of expertise. Prerequisites Before proceeding with this tutorial, it is recommended that you have a basic understanding of Java programming language, text editor and execution of programs etc. Print Page Previous Next Advertisements ”;

Lucene – Discussion

Discuss Lucene ”; Previous Next Lucene is an open source Java based search library. It is very popular and a fast search library. It is used in Java based applications to add document search capability to any kind of application in a very simple and efficient way. This tutorial will give you a great understanding on Lucene concepts and help you understand the complexity of search requirements in enterprise level applications and need of Lucene search engine. Print Page Previous Next Advertisements ”;

Lucene – Analysis

Lucene – Analysis ”; Previous Next In one of our previous chapters, we have seen that Lucene uses IndexWriter to analyze the Document(s) using the Analyzer and then creates/open/edit indexes as required. In this chapter, we are going to discuss the various types of Analyzer objects and other relevant objects which are used during the analysis process. Understanding the Analysis process and how analyzers work will give you great insight over how Lucene indexes the documents. Following is the list of objects that we”ll discuss in due course. S.No. Class & Description 1 Token Token represents text or word in a document with relevant details like its metadata (position, start offset, end offset, token type and its position increment). 2 TokenStream TokenStream is an output of the analysis process and it comprises of a series of tokens. It is an abstract class. 3 Analyzer This is an abstract base class for each and every type of Analyzer. 4 WhitespaceAnalyzer This analyzer splits the text in a document based on whitespace. 5 SimpleAnalyzer This analyzer splits the text in a document based on non-letter characters and puts the text in lowercase. 6 StopAnalyzer This analyzer works just as the SimpleAnalyzer and removes the common words like ”a”, ”an”, ”the”, etc. 7 StandardAnalyzer This is the most sophisticated analyzer and is capable of handling names, email addresses, etc. It lowercases each token and removes common words and punctuations, if any. Print Page Previous Next Advertisements ”;

Lucene – Query Programming

Lucene – Query Programming ”; Previous Next We have seen in previous chapter Lucene – Search Operation, Lucene uses IndexSearcher to make searches and it uses the Query object created by QueryParser as the input. In this chapter, we are going to discuss various types of Query objects and the different ways to create them programmatically. Creating different types of Query object gives control on the kind of search to be made. Consider a case of Advanced Search, provided by many applications where users are given multiple options to confine the search results. By Query programming, we can achieve the same very easily. Following is the list of Query types that we”ll discuss in due course. S.No. Class & Description 1 TermQuery This class acts as a core component which creates/updates indexes during the indexing process. 2 TermRangeQuery TermRangeQuery is used when a range of textual terms are to be searched. 3 PrefixQuery PrefixQuery is used to match documents whose index starts with a specified string. 4 BooleanQuery BooleanQuery is used to search documents which are result of multiple queries using AND, OR or NOT operators. 5 PhraseQuery Phrase query is used to search documents which contain a particular sequence of terms. 6 WildCardQuery WildcardQuery is used to search documents using wildcards like ”*” for any character sequence,? matching a single character. 7 FuzzyQuery FuzzyQuery is used to search documents using fuzzy implementation that is an approximate search based on the edit distance algorithm. 8 MatchAllDocsQuery MatchAllDocsQuery as the name suggests matches all the documents. Print Page Previous Next Advertisements ”;

Lucene – Quick Guide

Lucene – Quick Guide ”; Previous Next Lucene – Overview Lucene is a simple yet powerful Java-based Search library. It can be used in any application to add search capability to it. Lucene is an open-source project. It is scalable. This high-performance library is used to index and search virtually any kind of text. Lucene library provides the core operations which are required by any search application. Indexing and Searching. How Search Application works? A Search application performs all or a few of the following operations − Step Title Description 1 Acquire Raw Content The first step of any search application is to collect the target contents on which search application is to be conducted. 2 Build the document The next step is to build the document(s) from the raw content, which the search application can understand and interpret easily. 3 Analyze the document Before the indexing process starts, the document is to be analyzed as to which part of the text is a candidate to be indexed. This process is where the document is analyzed. 4 Indexing the document Once documents are built and analyzed, the next step is to index them so that this document can be retrieved based on certain keys instead of the entire content of the document. Indexing process is similar to indexes at the end of a book where common words are shown with their page numbers so that these words can be tracked quickly instead of searching the complete book. 5 User Interface for Search Once a database of indexes is ready then the application can make any search. To facilitate a user to make a search, the application must provide a user a mean or a user interface where a user can enter text and start the search process. 6 Build Query Once a user makes a request to search a text, the application should prepare a Query object using that text which can be used to inquire index database to get the relevant details. 7 Search Query Using a query object, the index database is then checked to get the relevant details and the content documents. 8 Render Results Once the result is received, the application should decide on how to show the results to the user using User Interface. How much information is to be shown at first look and so on. Apart from these basic operations, a search application can also provide administration user interface and help administrators of the application to control the level of search based on the user profiles. Analytics of search results is another important and advanced aspect of any search application. Lucene”s Role in Search Application Lucene plays role in steps 2 to step 7 mentioned above and provides classes to do the required operations. In a nutshell, Lucene is the heart of any search application and provides vital operations pertaining to indexing and searching. Acquiring contents and displaying the results is left for the application part to handle. In the next chapter, we will perform a simple Search application using Lucene Search library. Lucene – Environment Setup This tutorial will guide you on how to prepare a development environment to start your work with the Spring Framework. This tutorial will also teach you how to setup JDK, Tomcat and Eclipse on your machine before you set up the Spring Framework − Step 1 – Java Development Kit (JDK) Setup You can download the latest version of SDK from Oracle”s Java site: Java SE Downloads. You will find instructions for installing JDK in downloaded files; follow the given instructions to install and configure the setup. Finally set the PATH and JAVA_HOME environment variables to refer to the directory that contains Java and javac, typically java_install_dir/bin and java_install_dir respectively. If you are running Windows and installed the JDK in C:jdk1.6.0_15, you would have to put the following line in your C:autoexec.bat file. set PATH = C:jdk1.6.0_15bin;%PATH% set JAVA_HOME = C:jdk1.6.0_15 Alternatively, on Windows NT/2000/XP, you could also right-click on My Computer, select Properties, then Advanced, then Environment Variables. Then, you would update the PATH value and press the OK button. On Unix (Solaris, Linux, etc.), if the SDK is installed in /usr/local/jdk1.6.0_15 and you use the C shell, you would put the following into your .cshrc file. setenv PATH /usr/local/jdk1.6.0_15/bin:$PATH setenv JAVA_HOME /usr/local/jdk1.6.0_15 Alternatively, if you use an Integrated Development Environment (IDE) like Borland JBuilder, Eclipse, IntelliJ IDEA, or Sun ONE Studio, compile and run a simple program to confirm that the IDE knows where you installed Java, otherwise do proper setup as given in the document of the IDE. Step 2 – Eclipse IDE Setup All the examples in this tutorial have been written using Eclipse IDE. So I would suggest you should have the latest version of Eclipse installed on your machine. To install Eclipse IDE, download the latest Eclipse binaries from https://www.eclipse.org/downloads/. Once you downloaded the installation, unpack the binary distribution into a convenient location. For example, in C:eclipse on windows, or /usr/local/eclipse on Linux/Unix and finally set PATH variable appropriately. Eclipse can be started by executing the following commands on windows machine, or you can simply double click on eclipse.exe %C:eclipseeclipse.exe Eclipse can be started by executing the following commands on Unix (Solaris, Linux, etc.) machine − $/usr/local/eclipse/eclipse After a successful startup, it should display the following result − Step 3 – Setup Lucene Framework Libraries If the startup is successful, then you can proceed to set up your Lucene framework. Following are the simple steps to download and install the framework on your machine. https://archive.apache.org/dist/lucene/java/3.6.2/ Make a choice whether you want to install Lucene on Windows, or Unix and then proceed to the next step to download the .zip file for windows and .tz file for Unix. Download the suitable version of Lucene framework binaries from https://archive.apache.org/dist/lucene/java/. At the time of writing this tutorial, I downloaded lucene-3.6.2.zip on my Windows machine and when you unzip the downloaded file it will give you the directory structure inside