英文:
Split XML into smaller chunks based on the id of the grandchild
问题
Here's the translated code part:
public class ExtractXmls {
public static void main(String[] args) throws Exception {
String inputFile = "C:/pathToXML/Main.xml";
File xmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression allBookIdsExpression = xpath.compile("//Books/*/IdentifyingInformation/ID/BookId/text()");
NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);
List<String> bookIds = new ArrayList<>();
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
bookIds.add(bookId.getTextContent());
}
for (String bookId : bookIds) {
String xpathQuery = "//ID[BookId='" + bookId + "']";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement("Main");
aamcIdXml.appendChild(root);
for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
root.appendChild(copyNode);
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result = new StreamResult(new File("C:/pathToXML/" + bookId.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + bookId);
}
}
}
If you have any specific questions or need further assistance with this code, please feel free to ask.
英文:
I have an xml that should be split into smaller chunks by unique BookId node. Basically I need to filter out each book into separate xml having the same structure of the initial XML.
The purpose of that is - requirement to validate each smaller XML against XSD to determine which Book/PendingBook is not valid.
Note that Books node can contain both Book and PendingBook nodes.
Initial XML:
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>001</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<Book>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>002</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<PendingBook>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>003</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</PendingBook>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
The result should be like next xmls:
Book_001.xml (BookId = 001):
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>001</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
Book_002.xml (BookId = 002):
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>002</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
PendingBook_003.xml (BookId = 003):
<Main xmlns="http://some/url/name">
<Books>
<PendingBook>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>003</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</PendingBook>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
So far I fetched only each ID node into smaller xmls. And created root element manually.
Ideally I want to copy all elements from initial xml and put into Books node single Book/PendingBook node.
My java sample:
package com.main;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXmls {
/**
* @param args
*/
public static void main(String[] args) throws Exception
{
String inputFile = "C:/pathToXML/Main.xml";
File xmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression allBookIdsExpression = xpath.compile("//Books/*/IdentifyingInformation/ID/BookId/text()");
NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);
//Save all the products
List<String> bookIds = new ArrayList<>();
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
bookIds.add(bookId.getTextContent());
}
//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = "//ID[BookId='" + bookId + "']";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement("Main"); //Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
root.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result = new StreamResult(new File("C:/pathToXML/" + bookId.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + bookId);
}
}
}
答案1
得分: 1
你几乎已经让它工作了。你可以在循环迭代书籍ID的时候更改XPath,以获取Book
或PendingBook
元素,然后使用它。此外,你需要创建Books
元素,然后将Book
或PendingBook
附加到新创建的Books
元素中。
XPath是://ancestor::*[IdentifyingInformation/ID/BookId=bookId]
它获取了与当前迭代中的ID匹配的bookId的元素的祖先,即Book
或PendingBook
元素。
//现在我们创建并保存分割的XML
for (String bookId : bookIds)
{
//使用这样的查询,我可以根据bookId找到节点
String xpathQuery = "//ancestor::*[IdentifyingInformation/ID/BookId=" + bookId + "]";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);
//我们将新的XML文件存储在bookId.xml中,例如001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement("Main");
Element booksNode = aamcIdXml.createElement("Books");
root.appendChild(booksNode);
//在这里,我重新创建了根元素(不知道是否可以避免这样做,以某种方式复制初始XML的结构)
aamcIdXml.appendChild(root);
String bookName = "";
for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
bookName = copyNode.getNodeName();
booksNode.appendChild(copyNode);
}
//最后,我们将XML文件保存到磁盘上
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result = new StreamResult(new File(bookName + "_" + bookId.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + bookId);
}
我还修改了代码,以根据你的需求命名文件,例如Book_001.xml
。
英文:
You almost got it to work. You could change your XPath in your loop iterating the book IDs to get the Book
or PendingBook
Element and then use it. Also you need to create Books
element in addition to Main
and append Book
or PendingBook
to the newly created Books
Element.
The XPath is : //ancestor::*[IdentifyingInformation/ID/BookId=bookId]
It gets the ancestor of the element whose bookId matches to that of the ID in the current iteration i.e. the Book
or PendingBook
element.
//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = "//ancestor::*[IdentifyingInformation/ID/BookId=" + bookId + "]";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement("Main");
Element booksNode = aamcIdXml.createElement("Books");
root.appendChild(booksNode);
//Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
String bookName = "";
for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
bookName = copyNode.getNodeName();
booksNode.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result = new StreamResult(new File(bookName + "_" + bookId.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + bookId);
}
And also I modified code to name the file as you needed like Book_001.xml
.
答案2
得分: 1
Consider XSLT, the special purpose language designed to transform XML files including extracting needed nodes. Additionally, you can pass parameters from application layer like Java into XSLT (just like SQL)!
Specifically, iteratively passed in the XPath retrieved BookIds by Java into XSLT named param
. By the way, no extensive code re-factoring is needed since you already have the transformer
set up to run XSLT!
XSLT (save as .xsl, a special .xml)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<!-- INITIALIZE PARAMETER -->
<xsl:param name="param_bookId"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Books">
<xsl:copy>
<xsl:apply-templates select="Book[descendant::BookId = $param_bookId] |
PendingBook[descendant::BookId = $param_bookId]"/>
<xsl:apply-templates select="OtherInfo"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Java (no rebuild of trees)
// ... same code as reading XML input ...
// ... same code as creating bookIdNodes ...
String curr_bookId = null;
String outputXML = null;
String xslFile = "C:/Path/To/XSL/Style.xsl";
Source xslt = new StreamSource(new File(xslFile));
// ITERATE THROUGH EACH BOOK ID
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
curr_bookId = bookId.getTextContent();
// CONFIGURE TRANSFORMER
TransformerFactory prettyPrint = TransformerFactory.newInstance();
Transformer transformer = prettyPrint.newTransformer(xslt);
transformer.setParameter("param_bookId", curr_bookId); // PASS PARAM
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
// TRANSFORM AND OUTPUT FILE TO DISK
outputXML = "C:/Path/To/XML/BookId_" + curr_bookId + ".xml";
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(outputXML));
transformer.transform(source, result);
}
英文:
Consider XSLT, the special purpose language designed to transform XML files including extracting needed nodes. Additionally, you can pass parameters from application layer like Java into XSLT (just like SQL)!
Specifically, iteratively passed in the XPath retrieved BookIds by Java into XSLT named param
. By the way, no extensive code re-factoring is needed since you already have the transformer
set up to run XSLT!
XSLT (save as .xsl, a special .xml)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<!-- INITIALIZE PARAMETER -->
<xsl:param name="param_bookId"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Books">
<xsl:copy>
<xsl:apply-templates select="Book[descendant::BookId = $param_bookId] |
PendingBook[descendant::BookId = $param_bookId]"/>
<xsl:apply-templates select="OtherInfo"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
<kbd>Online Demo</kbd>
Java (no rebuild of trees)
// ... same code as reading XML input ...
// ... same code as creating bookIdNodes ...
String curr_bookId = null;
String outputXML = null;
String xslFile = "C:/Path/To/XSL/Style.xsl";
Source xslt = new StreamSource(new File(xslFile));
// ITERATE THROUGH EACH BOOK ID
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
curr_bookId = bookId.getTextContent();
// CONFIGURE TRANSFORMER
TransformerFactory prettyPrint = TransformerFactory.newInstance();
Transformer transformer = prettyPrint.newTransformer(xslt);
transformer.setParameter("param_bookId", curr_bookId); // PASS PARAM
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
// TRANSFORM AND OUTPUT FILE TO DISK
outputXML = "C:/Path/To/XML/BookId_" + curr_bookId + ".xml";
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(outputXML));
transformer.transform(source, result);
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论