OpenNLP – Environment In this chapter, we will discuss how you can setup OpenNLP environment in your system. Let’s start with the installation process. Installing OpenNLP Following are the steps to download Apache OpenNLP library in your system. Step 1 − Open the homepage of Apache OpenNLP by clicking the following link − . Step 2 − Now, click on the Downloads link. On clicking, you will be directed to a page where you can find various mirrors which will redirect you to the Apache Software Foundation Distribution directory. Step 3 − In this page you can find links to download various Apache distributions. Browse through them and find the OpenNLP distribution and click it. Step 4 − On clicking, you will be redirected to the directory where you can see the index of the OpenNLP distribution, as shown below. Click on the latest version from the available distributions. Step 5 − Each distribution provides Source and Binary files of OpenNLP library in various formats. Download the source and binary files, apache-opennlp-1.6.0-bin.zip and apache-opennlp1.6.0-src.zip (for Windows). Setting the Classpath After downloading the OpenNLP library, you need to set its path to the bin directory. Assume that you have downloaded the OpenNLP library to the E drive of your system. Now, follow the steps that are given below − Step 1 − Right-click on ”My Computer” and select ”Properties”. Step 2 − Click on the ”Environment Variables” button under the ”Advanced” tab. Step 3 − Select the path variable and click the Edit button, as shown in the following screenshot. Step 4 − In the Edit Environment Variable window, click the New button and add the path for OpenNLP directory E:apache-opennlp-1.6.0bin and click the OK button, as shown in the following screenshot. Eclipse Installation You can set the Eclipse environment for OpenNLP library, either by setting the Build path to the JAR files or by using pom.xml. Setting Build Path to the JAR Files Follow the steps given below to install OpenNLP in Eclipse − Step 1 − Make sure that you have Eclipse environment installed in your system. Step 2 − Open Eclipse. Click File → New → Open a new project, as shown below. Step 3 − You will get the New Project wizard. In this wizard, select Java project and proceed by clicking the Next button. Step 4 − Next, you will get the New Java Project wizard. Here, you need to create a new project and click the Next button, as shown below. Step 5 − After creating a new project, right-click on it, select Build Path and click Configure Build Path. Step 6 − Next, you will get the Java Build Path wizard. Here, click the Add External JARs button, as shown below. Step 7 − Select the jar files opennlp-tools-1.6.0.jar and opennlp-uima-1.6.0.jar located in the lib folder of apache-opennlp-1.6.0 folder. On clicking the Open button in the above screen, the selected files will be added to your library. On clicking OK, you will successfully add the required JAR files to the current project and you can verify these added libraries by expanding the Referenced Libraries, as shown below. Using pom.xml Convert the project into a Maven project and add the following code to its pom.xml. <project xmlns=”http://maven.apache.org/POM/4.0.0″ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd”> <modelVersion>4.0.0</modelVersion> <groupId>myproject</groupId> <artifactId>myproject</artifactId> <version>0.0.1-SNAPSHOT</version> <build> <sourceDirectory>src</sourceDirectory> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.5.1</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-uima</artifactId> <version>1.6.0</version> </dependency> </dependencies> </project> Learning working make money
Category: opennlp
OpenNLP – Named Entity Recognition The process of finding names, people, places, and other entities, from a given text is known as Named Entity Recognition (NER). In this chapter, we will discuss how to carry out NER through Java program using OpenNLP library. Named Entity Recognition using open NLP To perform various NER tasks, OpenNLP uses different predefined models namely, en-nerdate.bn, en-ner-location.bin, en-ner-organization.bin, en-ner-person.bin, and en-ner-time.bin. All these files are predefined models which are trained to detect the respective entities in a given raw text. The opennlp.tools.namefind package contains the classes and interfaces that are used to perform the NER task. To perform NER task using OpenNLP library, you need to − Load the respective model using the TokenNameFinderModel class. Instantiate the NameFinder class. Find the names and print them. Following are the steps to be followed to write a program which detects the name entities from a given raw text. Step 1: Loading the model The model for sentence detection is represented by the class named TokenNameFinderModel, which belongs to the package opennlp.tools.namefind. To load an NER model − Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the appropriate NER model in String format to its constructor). Instantiate the TokenNameFinderModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block. //Loading the NER-person model InputStream inputStreamNameFinder = new FileInputStream(“…/en-nerperson.bin”); TokenNameFinderModel model = new TokenNameFinderModel(inputStreamNameFinder); Step 2: Instantiating the NameFinderME class The NameFinderME class of the package opennlp.tools.namefind contains methods to perform the NER tasks. This class uses the Maximum Entropy model to find the named entities in the given raw text. Instantiate this class and pass the model object created in the previous step as shown below − //Instantiating the NameFinderME class NameFinderME nameFinder = new NameFinderME(model); Step 3: Finding the names in the sentence The find() method of the NameFinderME class is used to detect the names in the raw text passed to it. This method accepts a String variable as a parameter. Invoke this method by passing the String format of the sentence to this method. //Finding the names in the sentence Span nameSpans[] = nameFinder.find(sentence); Step 4: Printing the spans of the names in the sentence The find() method of the NameFinderME class returns an array of objects of the type Span. The class named Span of the opennlp.tools.util package is used to store the start and end integer of sets. You can store the spans returned by the find() method in the Span array and print them, as shown in the following code block. //Printing the sentences and their spans of a sentence for (Span span : spans) System.out.println(paragraph.substring(span); NER Example Following is the program which reads the given sentence and recognizes the spans of the names of the persons in it. Save this program in a file with the name NameFinderME_Example.java. import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.util.Span; public class NameFinderME_Example { public static void main(String args[]) throws Exception{ /Loading the NER – Person model InputStream inputStream = new FileInputStream(“C:/OpenNLP_models/en-ner-person.bin”); TokenNameFinderModel model = new TokenNameFinderModel(inputStream); //Instantiating the NameFinder class NameFinderME nameFinder = new NameFinderME(model); //Getting the sentence in the form of String array String [] sentence = new String[]{ “Mike”, “and”, “Smith”, “are”, “good”, “friends” }; //Finding the names in the sentence Span nameSpans[] = nameFinder.find(sentence); //Printing the spans of the names in the sentence for(Span s: nameSpans) System.out.println(s.toString()); } } Compile and execute the saved Java file from the Command prompt using the following commands − javac NameFinderME_Example.java java NameFinderME_Example On executing, the above program reads the given String (raw text), detects the names of the persons in it, and displays their positions (spans), as shown below. [0..1) person [2..3) person Names along with their Positions The substring() method of the String class accepts the begin and the end offsets and returns the respective string. We can use this method to print the names and their spans (positions) together, as shown in the following code block. for(Span s: nameSpans) System.out.println(s.toString()+” “+tokens[s.getStart()]); Following is the program to detect the names from the given raw text and display them along with their positions. Save this program in a file with the name NameFinderSentences.java. import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; public class NameFinderSentences { public static void main(String args[]) throws Exception{ //Loading the tokenizer model InputStream inputStreamTokenizer = new FileInputStream(“C:/OpenNLP_models/entoken.bin”); TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer); //Instantiating the TokenizerME class TokenizerME tokenizer = new TokenizerME(tokenModel); //Tokenizing the sentence in to a string array String sentence = “Mike is senior programming manager and Rama is a clerk both are working at Tutorialspoint”; String tokens[] = tokenizer.tokenize(sentence); //Loading the NER-person model InputStream inputStreamNameFinder = new FileInputStream(“C:/OpenNLP_models/enner-person.bin”); TokenNameFinderModel model = new TokenNameFinderModel(inputStreamNameFinder); //Instantiating the NameFinderME class NameFinderME nameFinder = new NameFinderME(model); //Finding the names in the sentence Span nameSpans[] = nameFinder.find(tokens); //Printing the names and their spans in a sentence for(Span s: nameSpans) System.out.println(s.toString()+” “+tokens[s.getStart()]); } } Compile and execute the saved Java file from the Command prompt using the following commands − javac NameFinderSentences.java java NameFinderSentences On executing, the above program reads the given String (raw text), detects the names of the persons in it, and displays their positions (spans) as shown below. [0..1) person Mike Finding the Names of the Location By loading various models, you can detect various named entities. Following is a Java program which loads the en-ner-location.bin model and detects the location names in the given sentence. Save this program in a file with the name LocationFinder.java. import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; public class LocationFinder { public static void main(String args[]) throws Exception{ InputStream inputStreamTokenizer = new FileInputStream(“C:/OpenNLP_models/entoken.bin”); TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer); //String paragraph = “Mike and Smith are classmates”; String paragraph = “Tutorialspoint is located in Hyderabad”; //Instantiating the TokenizerME class TokenizerME tokenizer = new TokenizerME(tokenModel); String tokens[] = tokenizer.tokenize(paragraph); //Loading the NER-location
OpenNLP Tutorial Job Search Apache OpenNLP is an open source Java library which is used process Natural Language text. OpenNLP provides services such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and co-reference resolution, etc. In this tutorial, we will understand how to use the OpenNLP library to build an efficient text processing service. Audience This tutorial has been prepared for beginners to make them understand how to use the OpenNLP library, and thus help them in building text processing services using this library. Prerequisites For this tutorial, it is assumed that the readers have a prior knowledge of Java programming language. Learning working make money