将XML根据孙子节点的ID拆分为较小的块。

huangapple go评论46阅读模式
英文:

Split XML into smaller chunks based on the id of the grandchild

问题

Here's the translated code part:

public class ExtractXmls {
    public static void main(String[] args) throws Exception {
        String inputFile = "C:/pathToXML/Main.xml";

        File xmlFile = new File(inputFile);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(xmlFile);

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);

        XPathFactory xfactory = XPathFactory.newInstance();
        XPath xpath = xfactory.newXPath();
        XPathExpression allBookIdsExpression = xpath.compile("//Books/*/IdentifyingInformation/ID/BookId/text()");
        NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);

        List<String> bookIds = new ArrayList<>();
        for (int i = 0; i < bookIdNodes.getLength(); ++i) {
            Node bookId = bookIdNodes.item(i);

            System.out.println(bookId.getTextContent());
            bookIds.add(bookId.getTextContent());
        }

        for (String bookId : bookIds) {
            String xpathQuery = "//ID[BookId='" + bookId + "']";
            xpath = xfactory.newXPath();
            XPathExpression query = xpath.compile(xpathQuery);
            NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
            System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);

            Document aamcIdXml = dBuilder.newDocument();
            Element root = aamcIdXml.createElement("Main");
            aamcIdXml.appendChild(root);
            for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
                Node node = bookIdNodesFiltered.item(i);
                Node copyNode = aamcIdXml.importNode(node, true);
                root.appendChild(copyNode);
            }

            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            DOMSource source = new DOMSource(aamcIdXml);

            StreamResult result = new StreamResult(new File("C:/pathToXML/" + bookId.trim() + ".xml"));
            transformer.transform(source, result);

            System.out.println("Done for " + bookId);
        }
    }
}

If you have any specific questions or need further assistance with this code, please feel free to ask.

英文:

I have an xml that should be split into smaller chunks by unique BookId node. Basically I need to filter out each book into separate xml having the same structure of the initial XML.

The purpose of that is - requirement to validate each smaller XML against XSD to determine which Book/PendingBook is not valid.

Note that Books node can contain both Book and PendingBook nodes.

Initial XML:

&lt;Main xmlns=&quot;http://some/url/name&quot;&gt;
&lt;Books&gt;
&lt;Book&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2021&lt;/Year&gt;
&lt;BookId&gt;001&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/Book&gt;
&lt;Book&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2020&lt;/Year&gt;
&lt;BookId&gt;002&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/Book&gt;
&lt;PendingBook&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2020&lt;/Year&gt;
&lt;BookId&gt;003&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/PendingBook&gt;
&lt;OtherInfo&gt;...&lt;/OtherInfo&gt;
&lt;/Books&gt;
&lt;/Main&gt;

The result should be like next xmls:

Book_001.xml (BookId = 001):

&lt;Main xmlns=&quot;http://some/url/name&quot;&gt;
&lt;Books&gt;
&lt;Book&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2021&lt;/Year&gt;
&lt;BookId&gt;001&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/Book&gt;
&lt;OtherInfo&gt;...&lt;/OtherInfo&gt;
&lt;/Books&gt;
&lt;/Main&gt;

Book_002.xml (BookId = 002):

&lt;Main xmlns=&quot;http://some/url/name&quot;&gt;
&lt;Books&gt;
&lt;Book&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2020&lt;/Year&gt;
&lt;BookId&gt;002&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/Book&gt;
&lt;OtherInfo&gt;...&lt;/OtherInfo&gt;
&lt;/Books&gt;
&lt;/Main&gt;

PendingBook_003.xml (BookId = 003):

&lt;Main xmlns=&quot;http://some/url/name&quot;&gt;
&lt;Books&gt;
&lt;PendingBook&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2021&lt;/Year&gt;
&lt;BookId&gt;003&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/PendingBook&gt;
&lt;OtherInfo&gt;...&lt;/OtherInfo&gt;
&lt;/Books&gt;
&lt;/Main&gt;

So far I fetched only each ID node into smaller xmls. And created root element manually.

Ideally I want to copy all elements from initial xml and put into Books node single Book/PendingBook node.

My java sample:

package com.main;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXmls {
/**
* @param args
*/
public static void main(String[] args) throws Exception
{
String inputFile = &quot;C:/pathToXML/Main.xml&quot;;
File xmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression allBookIdsExpression = xpath.compile(&quot;//Books/*/IdentifyingInformation/ID/BookId/text()&quot;);
NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);
//Save all the products
List&lt;String&gt; bookIds = new ArrayList&lt;&gt;();
for (int i = 0; i &lt; bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
bookIds.add(bookId.getTextContent());
}
//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = &quot;//ID[BookId=&#39;&quot; + bookId + &quot;&#39;]&quot;;
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println(&quot;Found &quot; + bookIdNodesFiltered.getLength() + &quot; bookId(s) for bookId &quot; + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement(&quot;Main&quot;); //Here I&#39;m recreating root element (don&#39;t know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
for (int i = 0; i &lt; bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
root.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, &quot;yes&quot;);
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result =  new StreamResult(new File(&quot;C:/pathToXML/&quot; + bookId.trim() + &quot;.xml&quot;));
transformer.transform(source, result);
System.out.println(&quot;Done for &quot; + bookId);
}
}
}

答案1

得分: 1

你几乎已经让它工作了。你可以在循环迭代书籍ID的时候更改XPath,以获取BookPendingBook元素,然后使用它。此外,你需要创建Books元素,然后将BookPendingBook附加到新创建的Books元素中。

XPath是://ancestor::*[IdentifyingInformation/ID/BookId=bookId]

它获取了与当前迭代中的ID匹配的bookId的元素的祖先,即BookPendingBook元素。

//现在我们创建并保存分割的XML
for (String bookId : bookIds)
{
    //使用这样的查询,我可以根据bookId找到节点
    String xpathQuery = "//ancestor::*[IdentifyingInformation/ID/BookId=" + bookId + "]";
    xpath = xfactory.newXPath();
    XPathExpression query = xpath.compile(xpathQuery);
    NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
    System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);

    //我们将新的XML文件存储在bookId.xml中,例如001.xml
    Document aamcIdXml = dBuilder.newDocument();
    Element root = aamcIdXml.createElement("Main");
    Element booksNode = aamcIdXml.createElement("Books");
    root.appendChild(booksNode);
    //在这里,我重新创建了根元素(不知道是否可以避免这样做,以某种方式复制初始XML的结构)
    aamcIdXml.appendChild(root);
    String bookName = "";
    for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
        Node node = bookIdNodesFiltered.item(i);
        Node copyNode = aamcIdXml.importNode(node, true);
        bookName = copyNode.getNodeName();
        booksNode.appendChild(copyNode);
    }

    //最后,我们将XML文件保存到磁盘上
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    DOMSource source = new DOMSource(aamcIdXml);

    StreamResult result =  new StreamResult(new File(bookName + "_" + bookId.trim() + ".xml"));
    transformer.transform(source, result);

    System.out.println("Done for " + bookId);
}

我还修改了代码,以根据你的需求命名文件,例如Book_001.xml

英文:

You almost got it to work. You could change your XPath in your loop iterating the book IDs to get the Book or PendingBook Element and then use it. Also you need to create Books element in addition to Main and append Book or PendingBook to the newly created Books Element.

The XPath is : //ancestor::*[IdentifyingInformation/ID/BookId=bookId]

It gets the ancestor of the element whose bookId matches to that of the ID in the current iteration i.e. the Book or PendingBook element.

//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = &quot;//ancestor::*[IdentifyingInformation/ID/BookId=&quot; + bookId + &quot;]&quot;;
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println(&quot;Found &quot; + bookIdNodesFiltered.getLength() + &quot; bookId(s) for bookId &quot; + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement(&quot;Main&quot;);
Element booksNode = aamcIdXml.createElement(&quot;Books&quot;);
root.appendChild(booksNode);
//Here I&#39;m recreating root element (don&#39;t know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
String bookName = &quot;&quot;;
for (int i = 0; i &lt; bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
bookName = copyNode.getNodeName();
booksNode.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, &quot;yes&quot;);
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result =  new StreamResult(new File(bookName + &quot;_&quot; + bookId.trim() + &quot;.xml&quot;));
transformer.transform(source, result);
System.out.println(&quot;Done for &quot; + bookId);
}

And also I modified code to name the file as you needed like Book_001.xml.

答案2

得分: 1

Consider XSLT, the special purpose language designed to transform XML files including extracting needed nodes. Additionally, you can pass parameters from application layer like Java into XSLT (just like SQL)!

Specifically, iteratively passed in the XPath retrieved BookIds by Java into XSLT named param. By the way, no extensive code re-factoring is needed since you already have the transformer set up to run XSLT!

XSLT (save as .xsl, a special .xml)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>

  <!-- INITIALIZE PARAMETER  -->
  <xsl:param name="param_bookId"/>

  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Books">
    <xsl:copy>
      <xsl:apply-templates select="Book[descendant::BookId = $param_bookId] |
                                   PendingBook[descendant::BookId = $param_bookId]"/>
      <xsl:apply-templates select="OtherInfo"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Java (no rebuild of trees)

// ... same code as reading XML input    ...
// ... same code as creating bookIdNodes ...

String curr_bookId = null;
String outputXML = null;

String xslFile = "C:/Path/To/XSL/Style.xsl";
Source xslt = new StreamSource(new File(xslFile));

// ITERATE THROUGH EACH BOOK ID
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
     Node bookId = bookIdNodes.item(i);

     System.out.println(bookId.getTextContent());
     curr_bookId = bookId.getTextContent();

     // CONFIGURE TRANSFORMER
     TransformerFactory prettyPrint = TransformerFactory.newInstance();
     Transformer transformer = prettyPrint.newTransformer(xslt);

     transformer.setParameter("param_bookId", curr_bookId);   // PASS PARAM
     transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
     transformer.setOutputProperty(OutputKeys.METHOD, "xml");
     transformer.setOutputProperty(OutputKeys.INDENT, "yes");
     transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
     transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

     // TRANSFORM AND OUTPUT FILE TO DISK 
     outputXML = "C:/Path/To/XML/BookId_" + curr_bookId + ".xml";

     DOMSource source = new DOMSource(doc);
     StreamResult result = new StreamResult(new File(outputXML));		
     transformer.transform(source, result);
}
英文:

Consider XSLT, the special purpose language designed to transform XML files including extracting needed nodes. Additionally, you can pass parameters from application layer like Java into XSLT (just like SQL)!

Specifically, iteratively passed in the XPath retrieved BookIds by Java into XSLT named param. By the way, no extensive code re-factoring is needed since you already have the transformer set up to run XSLT!

XSLT (save as .xsl, a special .xml)

&lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;
&lt;xsl:output indent=&quot;yes&quot; encoding=&quot;UTF-8&quot;/&gt;
&lt;xsl:strip-space elements=&quot;*&quot;/&gt;
&lt;!-- INITIALIZE PARAMETER  --&gt;
&lt;xsl:param name=&quot;param_bookId&quot;/&gt;
&lt;!-- IDENTITY TRANSFORM --&gt;
&lt;xsl:template match=&quot;@*|node()&quot;&gt;
&lt;xsl:copy&gt;
&lt;xsl:apply-templates select=&quot;@*|node()&quot;/&gt;
&lt;/xsl:copy&gt;
&lt;/xsl:template&gt;
&lt;xsl:template match=&quot;Books&quot;&gt;
&lt;xsl:copy&gt;
&lt;xsl:apply-templates select=&quot;Book[descendant::BookId = $param_bookId] |
PendingBook[descendant::BookId = $param_bookId]&quot;/&gt;
&lt;xsl:apply-templates select=&quot;OtherInfo&quot;/&gt;
&lt;/xsl:copy&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

<kbd>Online Demo</kbd>

Java (no rebuild of trees)

// ... same code as reading XML input    ...
// ... same code as creating bookIdNodes ...
String curr_bookId = null;
String outputXML = null;
String xslFile = &quot;C:/Path/To/XSL/Style.xsl&quot;;
Source xslt = new StreamSource(new File(xslFile));
// ITERATE THROUGH EACH BOOK ID
for (int i = 0; i &lt; bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
curr_bookId = bookId.getTextContent();
// CONFIGURE TRANSFORMER
TransformerFactory prettyPrint = TransformerFactory.newInstance();
Transformer transformer = prettyPrint.newTransformer(xslt);
transformer.setParameter(&quot;param_bookId&quot;, curr_bookId);   // PASS PARAM
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, &quot;no&quot;);
transformer.setOutputProperty(OutputKeys.METHOD, &quot;xml&quot;);
transformer.setOutputProperty(OutputKeys.INDENT, &quot;yes&quot;);
transformer.setOutputProperty(OutputKeys.ENCODING, &quot;UTF-8&quot;);
transformer.setOutputProperty(&quot;{http://xml.apache.org/xslt}indent-amount&quot;, &quot;4&quot;);
// TRANSFORM AND OUTPUT FILE TO DISK 
outputXML = &quot;C:/Path/To/XML/BookId_&quot; + curr_bookId + &quot;.xml&quot;;
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(outputXML));		
transformer.transform(source, result);
}

huangapple
  • 本文由 发表于 2020年7月31日 03:15:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/63179870.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定