Parse XML Document


Java DOM Parser – Parse XML Document



”;


Java DOM parser is a Java API to parse any XML document. Using the methods provided, we can retrieve root element, sub elements and their attributes using Java DOM parser.

In this tutorial we have used the getTagName() method to retrieve the tag name of elements, getFirstChild() to retrieve the first child of an element and getTextContent() to get the text content of elements.

Parse XML Using Java DOM parser

Having discussed various XML parsers available in Java, now let us see how we can use DOM parser to parse an XML file. We use parse() method to parse an XML file. Before jumping into the example directly, let us see the steps to parse XML document using Java DOM parser −

  • Step 1: Creating a DocumentBuilder Object
  • Step 2: Reading the XML
  • Step 3: Parsing the XML Document
  • Step 4: Retrieving the Elements

Step 1: Creating a DocumentBuilder Object

DocumentBuilderFactory is a factory API to obtain parser to parse XML documents by creating DOM trees. It has ”newDocumentBuilder()” method that creates an instance of the class ”DocumentBuilder”. This DocumentBuilder class is used to get input in the form of streams, files, URLs and SAX InputSources.


DocumentBuilderFactory factory =DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();

Step 2: Reading the XML

Input can be of file type or stream type. To input an XML file, Create a file object and pass the file path as argument.


File xmlFile = new File("input.xml");

To get input in the form of stream, we have used StringBuilder class and appended the input string and later converted it into bytes. The obtained ByteArrayInputStream is given as input to the document.


StringBuilder xmlBuilder = new StringBuilder(); 
xmlBuilder.append("<?xml version="1.0"?> <rootElement></rootElement>");
ByteArrayInputStream input = new ByteArrayInputStream( xmlBuilder.toString().getBytes("UTF-8"));

Step 3: Parsing the XML Document

DocumentBuilder created in above steps is used to parse the input XML file. It contains a method named parse() which accepts a file or input stream as a parameter and returns a DOM Document object. If the given file or input stream is NULL, this method throws an IllegalArgumentException.


Document xmldoc = docBuilder.parse(input);

Step4: Retrieving the Elements

The Node and Element interfaces of the org.w3c.dom. package provides various methods to retrieve desired information about elements from the XML documents. This information includes element”s name, text content, attributes and their values. We have many DOM interfaces and methods to get this information.

Retrieving Root Element Name

XML document constitutes of many elements. In Java an XML/HTML document is represented by the interface named Element. This interface provides various methods to retrieve, add and modify the contents of an XML/HTML document.

We can retrieve the name of the root element using the method named getTagName() of the Element interface. It returns the name of the root element in the form of a string.

Since Element is an interface, to create its object we need to use the getDocumentElement() method. This method retrieves and returns the root element in the form of an object.

Example

In the following example we have passed a simple XML document with just one root element ”college” using StringBuilder class. Then, we are retrieving it and printing on the console.


import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilder;

public class RetrieveRootElementName {
   public static void main(String[] args) {
      try {      
         //Creating a DocumentBuilder Object
         DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 
	     DocumentBuilder docBuilder = factory.newDocumentBuilder();
	    	  
         //Reading the XML
         StringBuilder xmlBuilder = new StringBuilder(); 
         xmlBuilder.append("<college></college>");
	    	  
         //Parsing the XML Document
         ByteArrayInputStream input = new ByteArrayInputStream(xmlBuilder.toString().getBytes("UTF-8"));
         Document xmldoc = docBuilder.parse(input);
	    	  
         //Retrieving the Root Element Name
         Element element = xmldoc.getDocumentElement();	    	  
         System.out.println("Root element name is "+element.getTagName());
	    	  
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}

Output

The element name, ”college” is displayed on the output screen as shown below −


Root element name is college

Parsing Single Sub Element in XML

We can parse a simple XML document with single element inside the root element. So far, we have seen how to retrieve the root element. Now, let us see how to get the sub element inside the root element.

Since, we have only one sub element, we are using getFirstChild() method to retrieve it. This method is used with the root element to get its first child. It returns the child node in the form of a Node object.

After retrieving the child node, getNodeName() method is used to get the name of the node. It returns the node name in the form of a string.

To get the text content, we use getTextContent() method. It returns the text content in the form of a String.

Example

Let us see the following example where we have one root element and a sub element. Here, ”college” is the root element with ”department” as sub element. The ”department” element has text content, “Computer Science”. We are retrieving the name and text content of the sub element.


import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;

import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilder;

public class SingleSubElement {
   public static void main(String[] args) {
	
      try {
             	  
    	 //Creating a DocumentBuilder Object
         DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 
    	 DocumentBuilder docBuilder = factory.newDocumentBuilder();
    	  
    	 //Reading the XML
    	 StringBuilder xmlBuilder = new StringBuilder(); 
    	 xmlBuilder.append("<college><department>Computer Science</department></college>");
    	  
    	 //Parsing the XML Document
    	 ByteArrayInputStream input = new ByteArrayInputStream(xmlBuilder.toString().getBytes("UTF-8"));
    	 Document xmldoc = docBuilder.parse(input);
    	 
    	 //Retrieving the Root Element
	     Element element = xmldoc.getDocumentElement();
	     
	     //Retrieving the Child Node
	     Node childNode = element.getFirstChild();
	     String childNodeName = childNode.getNodeName();
	     System.out.println("Sub Element name : " + childNodeName);
	     //Retrieving Text Content of the Child Node "+ childNodeName);
	     
	     System.out.println("Text content of Sub Element : "+childNode.getTextContent());
    	 
      } catch (Exception e) {
    	  e.printStackTrace();
      }
   }
}

The output window displays Sub element name and text content.


Sub Element name : department
Text content of Sub Element : Computer Science

Parsing Multiple Elements in XML

To parse an XML document with multiple elements we need to use loops. The getChildNodes() method retrieves all the child nodes of an element and returns it as a NodeList. We need to loop through all the elements of the obtained NodeList and retrieve the desired information about each element as we did in the previous sections.

Example

Now, let us add two more departments to the XML file (multipleElements.xml). Let us try to retrieve all the department names and staff count.


<college>
   <department>
      <name>Computer Science</name>
      <staffCount>20</staffCount>
   </department>
      <department>
      <name>Electrical and Electronics</name>
      <staffCount>23</staffCount>
   </department>
      <department>
      <name>Mechanical</name>
      <staffCount>15</staffCount>
   </department>
</college>    

In the following program, we retrieve the list of department elements into a NodeList and iterate all the departments to get the department name and staff count.


import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;

public class MultipleElementsXmlParsing {
   public static void main(String[] args) {      		
      try {      
     	       	  
         //Input the XML file
	     File inputXmlFile = new File("src/multipleElements.xml");
	           
	     //creating DocumentBuilder
	     DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
	     DocumentBuilder docBuilder = dbFactory.newDocumentBuilder();
	     Document xmldoc = docBuilder.parse(inputXmlFile);
	     
	     //Retrieving the Root Element
	     Element element = xmldoc.getDocumentElement();
	     System.out.println("Root element name is "+element.getTagName());
	           
	     //Getting the child elements List
	     NodeList nList = element.getChildNodes();
	           
	     //Iterating through all the child elements of the root
	     for (int temp = 0; temp < nList.getLength(); temp++) {      
            Node nNode = nList.item(temp);
	        System.out.println("nCurrent Element :" + nNode.getNodeName());
	             
	        if (nNode.getNodeType() == Node.ELEMENT_NODE) {     
               Element eElement = (Element) nNode;
	           System.out.println("Name of the department : " + eElement.getElementsByTagName("name").item(0).getTextContent());
	           System.out.println("Staff Count of the department : " + eElement.getElementsByTagName("staffCount").item(0).getTextContent());
	        }
         }
      } catch (Exception e) {
	         e.printStackTrace();
      }
   }
}

All the three departments with name and staff count are displayed.


Root element :college

Current Element :department
Name of the department : Computer Science
Staff Count of the department : 20

Current Element :department
Name of the department : Electrical and Electronics
Staff Count of the department : 23

Current Element :department
Name of the department : Mechanical
Staff Count of the department : 15

Parsing Attributes in XML

XML elements can have attributes and these can be retrieved using the getAttribute() method. This method takes attribute name as a parameter and returns its corresponding attribute value as a String. It returns an empty string if there is no attribute value or default value for the attribute name specified.

Example

Now, let us add an attribute, ”deptcode” to all the department elements in the ”attributesParsing.xml” file.


<?xml version = "1.0"?>
<college>
   <department deptcode = "DEP_CS23">
      <name>Computer Science</name>
      <staffCount>20</staffCount>
   </department>
   
   <department deptcode = "DEP_EC34">
      <name>Electrical and Electronics</name>
      <staffCount>23</staffCount>
   </department>
   
   <department deptcode = "DEP_MC89">
      <name>Mechanical</name>
      <staffCount>15</staffCount>
   </department>
</college>   

In the following program, we are retrieving deptcode along with name and staff count for each department.


import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;

public class AttributesXmlParsing {
   public static void main(String[] args) {	
      try {      	  
	     //Input the XML file
	     File inputXmlFile = new File("attributesParsing.xml");
	      
	     //creating DocumentBuilder
	     DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
	     DocumentBuilder docBuilder = dbFactory.newDocumentBuilder();
	     Document xmldoc = docBuilder.parse(inputXmlFile);
	      
	     //Getting the root element
	     System.out.println("Root element :" + xmldoc.getDocumentElement().getNodeName());
	     NodeList nList = xmldoc.getElementsByTagName("department");
	      
	     //Iterating through all the child elements of the root
	     for (int temp = 0; temp < nList.getLength(); temp++) {      
            Node nNode = nList.item(temp);
            System.out.println("nCurrent Element :" + nNode.getNodeName());
	        
            if (nNode.getNodeType() == Node.ELEMENT_NODE) {     
               Element eElement = (Element) nNode;
               System.out.println("Department Code : " + eElement.getAttribute("deptcode"));
               System.out.println("Name of the department : " + eElement.getElementsByTagName("name").item(0).getTextContent());
               System.out.println("Staff Count of the department : " + eElement.getElementsByTagName("staffCount").item(0).getTextContent());
            }
         }
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}

The three departments are displayed with their corresponding department code, name and staff count.


Root element :college

Current Element :department
Department Code : DEP_CS23
Name of the department : Computer Science
Staff Count of the department : 20

Current Element :department
Department Code : DEP_EC34
Name of the department : Electrical and Electronics
Staff Count of the department : 23

Current Element :department
Department Code : DEP_MC89
Name of the department : Mechanical
Staff Count of the department : 15

Advertisements

”;

Leave a Reply

Your email address will not be published. Required fields are marked *

Parse XML Document


Java SAX Parser – Parse XML Document


”;


Java SAX(Simple API for XML) parser is an API in Java to parse XML documents. SAX parser is an event based parser and uses a Handler class to handle the events. The call back methods such as startElement(), characters(), endElement() etc., are implemented inside the Handler class to obtain the details of elements and their attributes. These call back methods are called when the parser identifies the respective events.

Parse XML Using Java SAX parser

Following are the steps we need to follow to parse an XML document in Java using SAX parser −

  • Step 1: Implementing a Handler class
  • Step 2: Creating a SAXParser Object
  • Step 3: Reading the XML
  • Step 4: Creating object for Handler class
  • Step 5: Parsing the XML Document
  • Step 6: Retrieving the Elements

Step 1: Implementing a Handler class

Application program must implement a handler class to handle the events inside the XML document. After implementing the Handler class, it must be registered with the SAX parser.

As discussed in the previous chapter, the DefaultHandler class implements ContentHandler interface. It has the methods, startDocument(), endDocument(), startElement(), endElement() and characters() functions that help us parse the XML documents. We write the code inside these methods according to our requirement.

class UserHandler extends DefaultHandler {

   public void startDocument() {
      ...
   }

   public void startElement(String uri, String localName, String qName, Attributes attributes) {
      ...
   }

   public void characters(char[] ch, int start, int length) {
      ...
   }

   public void endElement(String uri, String localName, String qName) {
      ...
   }

   public void endDocument() {
      ...
   }
}

Step 2: Creating a SAXParser Object

The SAXParserFactory class is used to create a new factory instance which in turn is used to create the SAXParser object as follows −

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();

Step 3: Reading the XML

Read the XML file by specifying the proper file path as follows −

File xmlFile = new File("input.xml");

Instead of reading files, we can create an InputStream of the XML content as follows −

StringBuilder xmlBuilder = new StringBuilder(); 
xmlBuilder.append(""<?xml version="1.0"?> <rootElement></rootElement>"");
ByteArrayInputStream inputStream = new ByteArrayInputStream( xmlBuilder.toString().getBytes("UTF-8"));

Step 4: Creating object for Handler class

Create an object for the already implemented UserHandler class in first step as follows −

UserHandler userHandler = new UserHandler();

Step 5: Parsing the XML Document

The SAXParser class has the parse() method that takes two arguments, one is the file and the other is the DefaultHandler object. This function parses the given file as XML document using the functions implemented inside the DefaultHandler class.

saxParser.parse(xmlFile, userHandler);

The SAXParser class also has the function parse() that takes the content as InputStream −

saxParser.parse(inputStream, userHandler);

Step 6: Retrieving the Elements

After following the above five steps, we can now easily retrieve the required information about the elements. We should write the required code inside the methods of our Handler class in first step. All the methods available inside the ContentHandler interface are discussed in the previous chapter and in this chapter, we will implement these methods to retrieve the basic information about elements such as element name, text content and attributes.

Retrieving Element Name

Element name can be obtained from the startElement() method of ContentHandler interface. The third argument of this method is the name of the Element and it is of String type. We can implement this method in our Handler class and get the name of an Element.

Example

In the following example, we have taken XML content in the form of a String using StringBuilder class and converted into bytes using ByteArrayInputStream.

In the UserHandler class, we have implemented the startElement() method and printed the name of the Element. Since, we have only single element in the XML content, that becomes the root element of the document.

import java.io.ByteArrayInputStream;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

//Implementing UserHandler Class
class UserHandler extends DefaultHandler{
   public void startElement(String uri, String localName, String qName, Attributes attributes)
	  throws SAXException {
	  System.out.println("Root element is "+qName);
   }
}

public class RetrieveElementName {
   public static void main(String args[]) {
      try {
    	  
    	 //Creating a SAXParser Object             	  
    	 SAXParserFactory factory = SAXParserFactory.newInstance();
         SAXParser saxParser = factory.newSAXParser();
      
         //Reading the XML
         StringBuilder xmlBuilder = new StringBuilder();
   	     xmlBuilder.append("<college>XYZ College</college>");
   	     ByteArrayInputStream input = new ByteArrayInputStream(xmlBuilder.toString().getBytes("UTF-8"));
   	     
   	     //Creating UserHandler object
   	     UserHandler userhandler = new UserHandler();
   	     
   	     //Parsing the XML Document
   	     saxParser.parse(input, userhandler);
   	     
      }  catch (Exception e) {
    	 e.printStackTrace();
      }
   }
} 

Root Element name, ”college” is printed on the output screen.

Root element is college

Retrieving TextContent

To retrieve text content of an element, we have characters() method in ContentHandler interface. There is character array, start and length arguments in this method. As soon as the parser sees the content after “>” symbol, this method is called. The start argument carries the index of the first character after “>” symbol and length has the number of characters before it encounters “<” symbol.

Example

The following college.xml file has a single sub element, “department” with text content “Computer Science”. Let us write a Java program to retrieve this text content along with element names using SAX API.

<college>
   <department>Computer Science</department>
</college>

The UserHandler class inherits DefaultHandler and we have implemented startElement(), endElement() and characters() method. When the parser sees the text content inside department element, this method is called and we are printing it on the console.

import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

//Implementing UserHandler Class
class UserHandler extends DefaultHandler {
   public void startElement( String uri, String localName, String qName, Attributes attributes)
      throws SAXException { 
      System.out.println("Start Element : " + qName);
   }
	
   public void endElement(String uri, String localName, String qName) {
      System.out.println("End Element : " + qName);
   }
   public void characters(char[] ch, int start, int length) throws SAXException{
      System.out.println("Text Content : " + new String(ch, start, length));
   }	
}
public class RetrieveTextContent {
	public static void main(String args[]) {
	   try {
	
          //Creating a SAXParser Object             	  
          SAXParserFactory factory = SAXParserFactory.newInstance();
          SAXParser saxParser = factory.newSAXParser();
       
          //Reading the XML
          File xmlFile = new File("college.xml");
          
          //Creating UserHandler object
          UserHandler userHandler = new UserHandler();
          
          //Parsing the XML Document
          saxParser.parse(xmlFile, userHandler);
     
	   } catch(Exception e) {
          e.printStackTrace();
	   }
	}
}

The text content for department element is displayed. As there is no text content inside the “college” element, it is left blank.

Start Element : college
Text Content : 
	
Start Element : department
Text Content : Computer Science
End Element : department
Text Content : 

End Element : college

Retrieving Attributes

The method startElement() has Attributes as last argument which has the list of attributes inside the current Element. The getValue(“attr_name”) function inside the Attributes interface is used to get the value of the specified attribute.

Example

We have added few more department elements to our “college.xml” file and also added an attribute “deptcode” to each of the departments. Let us write a java program to retrieve all the elements along with their attributes.

<?xml version = "1.0"?>
<college>
   <department deptcode = "DEP_CS23">
      <name>Computer Science</name>
      <staffCount>20</staffCount>
   </department> 
   <department deptcode = "DEP_EC34">
      <name>Electrical and Electronics</name>
      <staffCount>23</staffCount>
   </department> 
   <department deptcode = "DEP_MC89">
      <name>Mechanical</name>
      <staffCount>15</staffCount>
   </department>
</college>

The following Java program implements startElement() and characters() methods in UserHandler class. We have initialised two boolean variables to let us notified about deptcode and staffCount attributes in department element, so that we can use these to print the attributes in characters() method.

import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

//Implementing UserHandler Class
class UserHandler extends DefaultHandler{
	
   boolean hasDeptName=false;
   boolean hasStaffCount=false;
	
   public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException{
		
      if(qName.equals("college")) {
         System.out.println("Root Element : "+qName + "n");
      }
      if(qName.equals("department")) {
         System.out.println("Current Element : "+qName);
		 System.out.println("Department code : "+ attributes.getValue("deptcode"));
	  }
	  if(qName.equals("name")) {
         hasDeptName=true;
      }
	  if(qName.equals("staffCount")) {
		 hasStaffCount=true;
	  }
   }
	
   public void characters(char[] ch, int start, int length) throws SAXException{
		
      if(hasDeptName) {
         System.out.println("Department Name : "+ new String(ch, start, length));
		 hasDeptName=false;
      }
	  if(hasStaffCount) {
         System.out.println("Staff Count : "+ new String(ch, start, length) + "n");
         hasStaffCount=false;
      }
   }
}

public class ParseAttributesSAX {
   public static void main(String args[]) {
      try {
                  			
	     //Creating a DocumentBuilder Object             	  
	     SAXParserFactory factory = SAXParserFactory.newInstance();
	     SAXParser saxParser = factory.newSAXParser();
	       
	     //Reading the XML
	     File xmlFile = new File("college.xml");
	     
	     //Creating UserHandler object
	     UserHandler userHandler = new UserHandler();
	     
	     //Parsing the XML Document
	     saxParser.parse(xmlFile, userHandler);
	     
	  } catch(Exception e) {
	          e.printStackTrace();
      }
   }
}

The ouput window displays names of each element along with the attributes.

Root Element : college

Current Element : department
Department code : DEP_CS23
Department Name : Computer Science
Staff Count : 20

Current Element : department
Department code : DEP_EC34
Department Name : Electrical and Electronics
Staff Count : 23

Current Element : department
Department code : DEP_MC89
Department Name : Mechanical
Staff Count : 15

Advertisements

”;

Leave a Reply

Your email address will not be published. Required fields are marked *