Home > Java > Parsing XML using DOM, SAX and StAX Parser in Java

Parsing XML using DOM, SAX and StAX Parser in Java

I happen to read through a chapter on XML parsing and building APIs in Java. And I tried out the different parser available on a sample XML. Then I thought of sharing it on my blog so that I can have a reference to the code as well as a reference for anyone reading this. In this post I parse the same XML in different parsers to perform the same operation of populating the XML content into objects and then adding the objects to a list.

The sample XML considered in the examples is:

<employees>
  <employee id="111">
    <firstName>Rakesh</firstName>
    <lastName>Mishra</lastName>
    <location>Bangalore</location>
  </employee>
  <employee id="112">
    <firstName>John</firstName>
    <lastName>Davis</lastName>
    <location>Chennai</location>
  </employee>
  <employee id="113">
    <firstName>Rajesh</firstName>
    <lastName>Sharma</lastName>
    <location>Pune</location>
  </employee>
</employees>

And the obejct into which the XML content is to be extracted is defined as below:

class Employee{
  String id;
  String firstName;
  String lastName;
  String location;

  @Override
  public String toString() {
    return firstName+" "+lastName+"("+id+")"+location;
  }
}

There are 3 main parsers for which I have given sample code:

Using DOM Parser

I am making use of the DOM parser implementation that comes with the JDK and in my example I am using JDK 7. The DOM Parser loads the complete XML content into a Tree structure. And we iterate through the Node and NodeList to get the content of the XML. The code for XML parsing using DOM parser is given below.

public class DOMParserDemo {

  public static void main(String[] args) throws Exception {
    //Get the DOM Builder Factory
    DocumentBuilderFactory factory = 
        DocumentBuilderFactory.newInstance();

    //Get the DOM Builder
    DocumentBuilder builder = factory.newDocumentBuilder();

    //Load and Parse the XML document
    //document contains the complete XML as a Tree.
    Document document = 
      builder.parse(
        ClassLoader.getSystemResourceAsStream("xml/employee.xml"));


    List<Employee> empList = new ArrayList<>();

    //Iterating through the nodes and extracting the data.
    NodeList nodeList = document.getDocumentElement().getChildNodes();

    for (int i = 0; i < nodeList.getLength(); i++) {

      //We have encountered an <employee> tag.
      Node node = nodeList.item(i);
      if (node instanceof Element) {
        Employee emp = new Employee();
        emp.id = node.getAttributes().
            getNamedItem("id").getNodeValue();

        NodeList childNodes = node.getChildNodes();
        for (int j = 0; j < childNodes.getLength(); j++) {
          Node cNode = childNodes.item(j);

          //Identifying the child tag of employee encountered. 
          if (cNode instanceof Element) {
            String content = cNode.getLastChild().
                getTextContent().trim();
            switch (cNode.getNodeName()) {
              case "firstName":
                emp.firstName = content;
                break;
              case "lastName":
                emp.lastName = content;
                break;
              case "location":
                emp.location = content;
                break;
            }
          }
        }
        empList.add(emp);
      }

    }

    //Printing the Employee list populated.
    for (Employee emp : empList) {
      System.out.println(emp);
    }

  }
}

class Employee{
  String id;
  String firstName;
  String lastName;
  String location;

  @Override
  public String toString() {
    return firstName+" "+lastName+"("+id+")"+location;
  }
}

The output for the above will be:

Rakesh Mishra(111)Bangalore
John Davis(112)Chennai
Rajesh Sharma(113)Pune

Using SAX Parser

SAX Parser is different from the DOM Parser where SAX parser doesn’t load the complete XML into the memory, instead it parses the XML line by line triggering different events as and when it encounters different elements like: opening tag, closing tag, character data, comments and so on. This is the reason why SAX Parser is called an event based parser.

Along with the XML source file, we also register a handler which extends the DefaultHandler class. The DefaultHandler class provides different callbacks out of which we would be interested in:

  • startElement() – triggers this event when the start of the tag is encountered.
  • endElement() – triggers this event when the end of the tag is encountered.
  • characters() – triggers this event when it encounters some text data.

The code for parsing the XML using SAX Parser is given below:

import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParserDemo {

  public static void main(String[] args) throws Exception {
    SAXParserFactory parserFactor = SAXParserFactory.newInstance();
    SAXParser parser = parserFactor.newSAXParser();
    SAXHandler handler = new SAXHandler();
    parser.parse(ClassLoader.getSystemResourceAsStream("xml/employee.xml"), 
                 handler);
    
    //Printing the list of employees obtained from XML
    for ( Employee emp : handler.empList){
      System.out.println(emp);
    }
  }
}
/**
 * The Handler for SAX Events.
 */
class SAXHandler extends DefaultHandler {

  List<Employee> empList = new ArrayList<>();
  Employee emp = null;
  String content = null;
  @Override
  //Triggered when the start of tag is found.
  public void startElement(String uri, String localName, 
                           String qName, Attributes attributes) 
                           throws SAXException {
      
    switch(qName){
      //Create a new Employee object when the start tag is found
      case "employee":
        emp = new Employee();
        emp.id = attributes.getValue("id");
        break;
    }
  }

  @Override
  public void endElement(String uri, String localName, 
                         String qName) throws SAXException {
   switch(qName){
     //Add the employee to list once end tag is found
     case "employee":
       empList.add(emp);       
       break;
     //For all other end tags the employee has to be updated.
     case "firstName":
       emp.firstName = content;
       break;
     case "lastName":
       emp.lastName = content;
       break;
     case "location":
       emp.location = content;
       break;
   }
  }

  @Override
  public void characters(char[] ch, int start, int length) 
          throws SAXException {
    content = String.copyValueOf(ch, start, length).trim();
  }
    
}

class Employee {

  String id;
  String firstName;
  String lastName;
  String location;

  @Override
  public String toString() {
    return firstName + " " + lastName + "(" + id + ")" + location;
  }
}

The output for the above would be:

Rakesh Mishra(111)Bangalore
John Davis(112)Chennai
Rajesh Sharma(113)Pune

Using StAX Parser

StAX stands for Streaming API for XML and StAX Parser is different from DOM in the same way SAX Parser is. StAX parser is also in a subtle way different from SAX parser.

  • The SAX Parser pushes the data but StAX parser pulls the required data from the XML.
  • The StAX parser maintains a cursor at the current position in the document allows to extract the content available at the cursor whereas SAX parser issues events as and when certain data is encountered.

XMLInputFactory and XMLStreamReader are the two class which can be used to load an XML file. And as we read through the XML file using XMLStreamReader, events are generated in the form of integer values and these are then compared with the constants in XMLStreamConstants. The below code shows how to parse XML using StAX parser:

import java.util.ArrayList;
import java.util.List;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;

public class StaxParserDemo {
  public static void main(String[] args) throws XMLStreamException {
    List<Employee> empList = null;
    Employee currEmp = null;
    String tagContent = null;
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLStreamReader reader = 
        factory.createXMLStreamReader(
        ClassLoader.getSystemResourceAsStream("xml/employee.xml"));
        
    while(reader.hasNext()){
      int event = reader.next();
      
      switch(event){
        case XMLStreamConstants.START_ELEMENT: 
          if ("employee".equals(reader.getLocalName())){
            currEmp = new Employee();
            currEmp.id = reader.getAttributeValue(0);
          }
          if("employees".equals(reader.getLocalName())){
            empList = new ArrayList<>();
          }
          break;
          
        case XMLStreamConstants.CHARACTERS:
          tagContent = reader.getText().trim();
          break;
          
        case XMLStreamConstants.END_ELEMENT:
          switch(reader.getLocalName()){
            case "employee":
              empList.add(currEmp);
              break;
            case "firstName":
              currEmp.firstName = tagContent;
              break;
            case "lastName":
              currEmp.lastName = tagContent;
              break;
            case "location":
              currEmp.location = tagContent;
              break;
          }
          break;
            
        case XMLStreamConstants.START_DOCUMENT:
          empList = new ArrayList<>();
          break;
      }

    }
    
    //Print the employee list populated from XML
    for ( Employee emp : empList){
      System.out.println(emp);
    }
      
  }
}

class Employee{
  String id;
  String firstName;
  String lastName;
  String location;
  
  @Override
  public String toString(){
    return firstName+" "+lastName+"("+id+") "+location;
  }
}

The output for the above is:

Rakesh Mishra(111) Bangalore
John Davis(112) Chennai
Rajesh Sharma(113) Pune

With this I have covered parsing the same XML document and performing the same task of populating the list of Employee objects using all the three parsers namely:

Categories: Java Tags: , , ,
  1. July 22nd, 2013 at 13:06 | #1

    If you have a simple bean like the one you’ve shown, why not just use JAXB? All you have to do is add some annotations to your bean and run a little unmarshal call

    • July 22nd, 2013 at 13:35 | #2

      Yes I agree that the above example calls for the use of JAXB. I took a simpler example to compare different parsing techniques available.

      Will JAXB not require a XSD to be defined?

      • July 22nd, 2013 at 15:06 | #3

        I don’t think you need an XSD with JAXB. People tend to use XSDs to define the XML formats and then generate annotated Java beans using XJC, though. XSDs are probably simpler to write than tons of correctly annotated POJOs

  2. sowmiya
    January 30th, 2014 at 18:11 | #4

    Hi All,

    My XML file is,

    1
    sowmi

    2
    sowmi

    I want to process this file using Stax parser.And transform Xml content into separate XML files.The Xml file name should be the Id of the student like 1.xml,2.xml.The Xml file should contains the details as

    1.xml

    1
    sowmi

    Can you please send coding regarding the above objective?
    Can you please guide us?

    Thank you,
    Sowmiya

  3. Andrew
    August 14th, 2014 at 22:05 | #5

    You need to account for the fact that CHARACTERS can be called multiple times per element.

    • Samra
      January 9th, 2015 at 16:09 | #6

      Yes It can be called multiple times. Need to maintain StringBuilder variable at class level

  4. August 25th, 2014 at 12:03 | #7

    Good article

  5. Xyz Def
    September 24th, 2014 at 19:13 | #8

    Nice, good work

  6. October 14th, 2014 at 02:16 | #9

    good job!

  7. January 29th, 2015 at 04:22 | #10

    Hello All,
    This is a great thread.
    If Multiple xml files needs to be read and produce output grabbing some values from multiple xml files how can we do it ?.

  8. February 13th, 2015 at 00:56 | #11

    Hi All
    Trying to push the below values to a text file.

    Rakesh Mishra(111)Bangalore
    John Davis(112)Chennai
    Rajesh Sharma(113)Pune

    code:
    class Employee {

    String id;
    String firstName;
    String lastName;
    String location;

    @Override
    public String toString() {
    return firstName + ” ” + lastName + “(” + id + “)” + location;
    String line = firstName + ” ” + lastName + “(” + id + “)” + location;
    WriteToFileExample(line);
    }
    }
    public void WriteToFileExample(String line) {

    try {

    String content = name + ” ” + “(” + DebugMode + “)” + mAdminState;

    File file = new File(“config/config.properties”);

    // if file doesnt exists, then create it
    if (!file.exists()) {
    file.createNewFile();
    }
    // To append the values
    FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
    //FileWriter fw = new FileWriter(file.getAbsoluteFile());
    //System.out.println(file.getAbsoluteFile());
    BufferedWriter bw = new BufferedWriter(fw);
    bw.write(content);
    bw.close();

    //fw.write(content);
    //fw.close();

    /*
    BufferedWriter bw = new BufferedWriter(fw);
    bw.write(content);
    bw.close();
    */

    System.out.println(“Done”);

    } catch (IOException e) {
    e.printStackTrace();
    }

    }

    But whenever
    FileWriter fw = new FileWriter(file.getAbsoluteFile()); –> will write only one last line i.e. Rajesh Sharma(113)Pune

    FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
    writes all the lines. But the issue is when we re-run the program it’ll retain the previous line i.e. 3 lines and add 3 more on top of it… How to avoid this and get the 3 lines

  9. Gabriel
    May 15th, 2015 at 00:02 | #12

    Thank you man! Really helpful!

  1. May 25th, 2013 at 00:15 | #1
  2. July 17th, 2013 at 16:22 | #2
  3. November 24th, 2013 at 09:20 | #3