Thursday, August 6, 2015

How to split big XML file into small files with certain number of items?

I have a huge xml file with over 2000 PO receipts. To make the data processing fast, I would like to split this big file into small xml files.

This is the structure of original xml file.


This is the split file I would like to have. This file is with same header information as the original one.



This is the code I wrote in Java.

 public class XmlSplit {  
   public static void main(String [] args) throws Exception {  
     File input = new File("input.xml");  
     DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();  
     Document doc = dbf.newDocumentBuilder().parse(input);  
     XPath xpath = XPathFactory.newInstance().newXPath();  
     NodeList headernodes = (NodeList) xpath.evaluate("//PurchaseReceiptMessage/Header", doc, XPathConstants.NODESET);  
     NodeList nodes = (NodeList) xpath.evaluate("//PurchaseReceiptMessage/PurchaseReceipt", doc, XPathConstants.NODESET);  
     int itemsPerFile = 5;  
     int fileNumber = 0;  
     Document currentDoc = dbf.newDocumentBuilder().newDocument();  
     Node rootNode = currentDoc.createElement("PurchaseReceiptMessage");  
     for (int i=1; i <= headernodes.getLength(); i++) {  
     Node headerNode = currentDoc.importNode(headernodes.item(i-1), true);  
     rootNode.appendChild(headerNode);  
     }  
     File currentFile = new File(fileNumber+".xml");  
     for (int i=1; i <= nodes.getLength(); i++) {  
       Node imported = currentDoc.importNode(nodes.item(i-1), true);  
       rootNode.appendChild(imported);  
       if (i % itemsPerFile == 0) {  
         writeToFile(rootNode, currentFile);  
         rootNode = currentDoc.createElement("PurchaseReceiptMessage");  
         currentFile = new File((++fileNumber)+".xml");  
       }  
     }  
     writeToFile(rootNode, currentFile);  
   }  
   private static void writeToFile(Node node, File file) throws Exception {  
     Transformer transformer = TransformerFactory.newInstance().newTransformer();  
     transformer.transform(new DOMSource(node), new StreamResult(new FileWriter(file)));  
   }  
 }