2020年7月31日 03:15:20go评论91阅读模式

英文:

Split XML into smaller chunks based on the id of the grandchild

问题

Here's the translated code part:

public class ExtractXmls {
    public static void main(String[] args) throws Exception {
        String inputFile = "C:/pathToXML/Main.xml";

        File xmlFile = new File(inputFile);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(xmlFile);

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);

        XPathFactory xfactory = XPathFactory.newInstance();
        XPath xpath = xfactory.newXPath();
        XPathExpression allBookIdsExpression = xpath.compile("//Books/*/IdentifyingInformation/ID/BookId/text()");
        NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);

        List<String> bookIds = new ArrayList<>();
        for (int i = 0; i < bookIdNodes.getLength(); ++i) {
            Node bookId = bookIdNodes.item(i);

            System.out.println(bookId.getTextContent());
            bookIds.add(bookId.getTextContent());
        }

        for (String bookId : bookIds) {
            String xpathQuery = "//ID[BookId='" + bookId + "']";
            xpath = xfactory.newXPath();
            XPathExpression query = xpath.compile(xpathQuery);
            NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
            System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);

            Document aamcIdXml = dBuilder.newDocument();
            Element root = aamcIdXml.createElement("Main");
            aamcIdXml.appendChild(root);
            for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
                Node node = bookIdNodesFiltered.item(i);
                Node copyNode = aamcIdXml.importNode(node, true);
                root.appendChild(copyNode);
            }

            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            DOMSource source = new DOMSource(aamcIdXml);

            StreamResult result = new StreamResult(new File("C:/pathToXML/" + bookId.trim() + ".xml"));
            transformer.transform(source, result);

            System.out.println("Done for " + bookId);
        }
    }
}

If you have any specific questions or need further assistance with this code, please feel free to ask.

英文:

I have an xml that should be split into smaller chunks by unique BookId node. Basically I need to filter out each book into separate xml having the same structure of the initial XML.

The purpose of that is - requirement to validate each smaller XML against XSD to determine which Book/PendingBook is not valid.

Note that Books node can contain both Book and PendingBook nodes.

Initial XML:

&lt;Main xmlns=&quot;http://some/url/name&quot;&gt;
&lt;Books&gt;
&lt;Book&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2021&lt;/Year&gt;
&lt;BookId&gt;001&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/Book&gt;
&lt;Book&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2020&lt;/Year&gt;
&lt;BookId&gt;002&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/Book&gt;
&lt;PendingBook&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2020&lt;/Year&gt;
&lt;BookId&gt;003&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/PendingBook&gt;
&lt;OtherInfo&gt;...&lt;/OtherInfo&gt;
&lt;/Books&gt;
&lt;/Main&gt;

The result should be like next xmls:

Book_001.xml (BookId = 001):

&lt;Main xmlns=&quot;http://some/url/name&quot;&gt;
&lt;Books&gt;
&lt;Book&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2021&lt;/Year&gt;
&lt;BookId&gt;001&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/Book&gt;
&lt;OtherInfo&gt;...&lt;/OtherInfo&gt;
&lt;/Books&gt;
&lt;/Main&gt;

Book_002.xml (BookId = 002):

&lt;Main xmlns=&quot;http://some/url/name&quot;&gt;
&lt;Books&gt;
&lt;Book&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2020&lt;/Year&gt;
&lt;BookId&gt;002&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/Book&gt;
&lt;OtherInfo&gt;...&lt;/OtherInfo&gt;
&lt;/Books&gt;
&lt;/Main&gt;

PendingBook_003.xml (BookId = 003):

&lt;Main xmlns=&quot;http://some/url/name&quot;&gt;
&lt;Books&gt;
&lt;PendingBook&gt;
&lt;IdentifyingInformation&gt;
&lt;ID&gt;
&lt;Year&gt;2021&lt;/Year&gt;
&lt;BookId&gt;003&lt;/BookId&gt;
&lt;BookDateTime&gt;2021-05-10T12:35:00&lt;/BookDateTime&gt;
&lt;/ID&gt;
&lt;/IdentifyingInformation&gt;
&lt;/PendingBook&gt;
&lt;OtherInfo&gt;...&lt;/OtherInfo&gt;
&lt;/Books&gt;
&lt;/Main&gt;

So far I fetched only each ID node into smaller xmls. And created root element manually.

Ideally I want to copy all elements from initial xml and put into Books node single Book/PendingBook node.

My java sample:

package com.main;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXmls {
/**
* @param args
*/
public static void main(String[] args) throws Exception
{
String inputFile = &quot;C:/pathToXML/Main.xml&quot;;
File xmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression allBookIdsExpression = xpath.compile(&quot;//Books/*/IdentifyingInformation/ID/BookId/text()&quot;);
NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);
//Save all the products
List&lt;String&gt; bookIds = new ArrayList&lt;&gt;();
for (int i = 0; i &lt; bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
bookIds.add(bookId.getTextContent());
}
//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = &quot;//ID[BookId=&#39;&quot; + bookId + &quot;&#39;]&quot;;
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println(&quot;Found &quot; + bookIdNodesFiltered.getLength() + &quot; bookId(s) for bookId &quot; + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement(&quot;Main&quot;); //Here I&#39;m recreating root element (don&#39;t know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
for (int i = 0; i &lt; bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
root.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, &quot;yes&quot;);
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result =  new StreamResult(new File(&quot;C:/pathToXML/&quot; + bookId.trim() + &quot;.xml&quot;));
transformer.transform(source, result);
System.out.println(&quot;Done for &quot; + bookId);
}
}
}

答案1

得分: 1

你几乎已经让它工作了。你可以在循环迭代书籍ID的时候更改XPath，以获取Book或PendingBook元素，然后使用它。此外，你需要创建Books元素，然后将Book或PendingBook附加到新创建的Books元素中。

XPath是：//ancestor::*[IdentifyingInformation/ID/BookId=bookId]

它获取了与当前迭代中的ID匹配的bookId的元素的祖先，即Book或PendingBook元素。

//现在我们创建并保存分割的XML
for (String bookId : bookIds)
{
    //使用这样的查询，我可以根据bookId找到节点
    String xpathQuery = "//ancestor::*[IdentifyingInformation/ID/BookId=" + bookId + "]";
    xpath = xfactory.newXPath();
    XPathExpression query = xpath.compile(xpathQuery);
    NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
    System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);

    //我们将新的XML文件存储在bookId.xml中，例如001.xml
    Document aamcIdXml = dBuilder.newDocument();
    Element root = aamcIdXml.createElement("Main");
    Element booksNode = aamcIdXml.createElement("Books");
    root.appendChild(booksNode);
    //在这里，我重新创建了根元素（不知道是否可以避免这样做，以某种方式复制初始XML的结构）
    aamcIdXml.appendChild(root);
    String bookName = "";
    for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
        Node node = bookIdNodesFiltered.item(i);
        Node copyNode = aamcIdXml.importNode(node, true);
        bookName = copyNode.getNodeName();
        booksNode.appendChild(copyNode);
    }

    //最后，我们将XML文件保存到磁盘上
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    DOMSource source = new DOMSource(aamcIdXml);

    StreamResult result =  new StreamResult(new File(bookName + "_" + bookId.trim() + ".xml"));
    transformer.transform(source, result);

    System.out.println("Done for " + bookId);
}

我还修改了代码，以根据你的需求命名文件，例如Book_001.xml。

英文:

You almost got it to work. You could change your XPath in your loop iterating the book IDs to get the Book or PendingBook Element and then use it. Also you need to create Books element in addition to Main and append Book or PendingBook to the newly created Books Element.

The XPath is : //ancestor::*[IdentifyingInformation/ID/BookId=bookId]

It gets the ancestor of the element whose bookId matches to that of the ID in the current iteration i.e. the Book or PendingBook element.

//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = &quot;//ancestor::*[IdentifyingInformation/ID/BookId=&quot; + bookId + &quot;]&quot;;
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println(&quot;Found &quot; + bookIdNodesFiltered.getLength() + &quot; bookId(s) for bookId &quot; + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement(&quot;Main&quot;);
Element booksNode = aamcIdXml.createElement(&quot;Books&quot;);
root.appendChild(booksNode);
//Here I&#39;m recreating root element (don&#39;t know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
String bookName = &quot;&quot;;
for (int i = 0; i &lt; bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
bookName = copyNode.getNodeName();
booksNode.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, &quot;yes&quot;);
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result =  new StreamResult(new File(bookName + &quot;_&quot; + bookId.trim() + &quot;.xml&quot;));
transformer.transform(source, result);
System.out.println(&quot;Done for &quot; + bookId);
}

And also I modified code to name the file as you needed like Book_001.xml.

答案2

得分: 1

Consider XSLT, the special purpose language designed to transform XML files including extracting needed nodes. Additionally, you can pass parameters from application layer like Java into XSLT (just like SQL)!

Specifically, iteratively passed in the XPath retrieved BookIds by Java into XSLT named param. By the way, no extensive code re-factoring is needed since you already have the transformer set up to run XSLT!

XSLT (save as .xsl, a special .xml)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>

  <!-- INITIALIZE PARAMETER  -->
  <xsl:param name="param_bookId"/>

  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Books">
    <xsl:copy>
      <xsl:apply-templates select="Book[descendant::BookId = $param_bookId] |
                                   PendingBook[descendant::BookId = $param_bookId]"/>
      <xsl:apply-templates select="OtherInfo"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Java (no rebuild of trees)

// ... same code as reading XML input    ...
// ... same code as creating bookIdNodes ...

String curr_bookId = null;
String outputXML = null;

String xslFile = "C:/Path/To/XSL/Style.xsl";
Source xslt = new StreamSource(new File(xslFile));

// ITERATE THROUGH EACH BOOK ID
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
     Node bookId = bookIdNodes.item(i);

     System.out.println(bookId.getTextContent());
     curr_bookId = bookId.getTextContent();

     // CONFIGURE TRANSFORMER
     TransformerFactory prettyPrint = TransformerFactory.newInstance();
     Transformer transformer = prettyPrint.newTransformer(xslt);

     transformer.setParameter("param_bookId", curr_bookId);   // PASS PARAM
     transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
     transformer.setOutputProperty(OutputKeys.METHOD, "xml");
     transformer.setOutputProperty(OutputKeys.INDENT, "yes");
     transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
     transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

     // TRANSFORM AND OUTPUT FILE TO DISK 
     outputXML = "C:/Path/To/XML/BookId_" + curr_bookId + ".xml";

     DOMSource source = new DOMSource(doc);
     StreamResult result = new StreamResult(new File(outputXML));		
     transformer.transform(source, result);
}

英文:

XSLT (save as .xsl, a special .xml)

&lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;
&lt;xsl:output indent=&quot;yes&quot; encoding=&quot;UTF-8&quot;/&gt;
&lt;xsl:strip-space elements=&quot;*&quot;/&gt;
&lt;!-- INITIALIZE PARAMETER  --&gt;
&lt;xsl:param name=&quot;param_bookId&quot;/&gt;
&lt;!-- IDENTITY TRANSFORM --&gt;
&lt;xsl:template match=&quot;@*|node()&quot;&gt;
&lt;xsl:copy&gt;
&lt;xsl:apply-templates select=&quot;@*|node()&quot;/&gt;
&lt;/xsl:copy&gt;
&lt;/xsl:template&gt;
&lt;xsl:template match=&quot;Books&quot;&gt;
&lt;xsl:copy&gt;
&lt;xsl:apply-templates select=&quot;Book[descendant::BookId = $param_bookId] |
PendingBook[descendant::BookId = $param_bookId]&quot;/&gt;
&lt;xsl:apply-templates select=&quot;OtherInfo&quot;/&gt;
&lt;/xsl:copy&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

<kbd>Online Demo</kbd>

Java (no rebuild of trees)

// ... same code as reading XML input    ...
// ... same code as creating bookIdNodes ...
String curr_bookId = null;
String outputXML = null;
String xslFile = &quot;C:/Path/To/XSL/Style.xsl&quot;;
Source xslt = new StreamSource(new File(xslFile));
// ITERATE THROUGH EACH BOOK ID
for (int i = 0; i &lt; bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
curr_bookId = bookId.getTextContent();
// CONFIGURE TRANSFORMER
TransformerFactory prettyPrint = TransformerFactory.newInstance();
Transformer transformer = prettyPrint.newTransformer(xslt);
transformer.setParameter(&quot;param_bookId&quot;, curr_bookId);   // PASS PARAM
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, &quot;no&quot;);
transformer.setOutputProperty(OutputKeys.METHOD, &quot;xml&quot;);
transformer.setOutputProperty(OutputKeys.INDENT, &quot;yes&quot;);
transformer.setOutputProperty(OutputKeys.ENCODING, &quot;UTF-8&quot;);
transformer.setOutputProperty(&quot;{http://xml.apache.org/xslt}indent-amount&quot;, &quot;4&quot;);
// TRANSFORM AND OUTPUT FILE TO DISK 
outputXML = &quot;C:/Path/To/XML/BookId_&quot; + curr_bookId + &quot;.xml&quot;;
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(outputXML));		
transformer.transform(source, result);
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将XML根据孙子节点的ID拆分为较小的块。

问题

答案1

答案2

Discord4J – 仅在特定频道中使用斜杠命令

Constructor Event in class Event<T> cannot be applied to given types; cannot find symbol method getTitle()

问题：在使用Spring Boot与Thymeleaf的Java中将用户添加到数据库时出现错误。

jsp – 如何替代不推荐使用的ExpressionEvaluator

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论