阅读复杂的 XML 文件在 Java 中

huangapple go评论67阅读模式
英文:

Read Complex Xml file in java

问题

try {
    File fXmlFile = new File("filepath");
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(fXmlFile);
    
    doc.getDocumentElement().normalize();
    NodeList billNodeList = doc.getElementsByTagName("ENVELOPE");
    for(int i=0; i<billNodeList.getLength(); i++){
        Node voucherNode = billNodeList.item(i);
        Element voucherElement = (Element) voucherNode;
        NodeList nList = voucherElement.getElementsByTagName("BILLFIXED");
        
        for (int temp = 0; temp < nList.getLength(); temp++) {
            Node insideNode = nList.item(temp);
            Element voucherElements = (Element) insideNode;
            System.out.println(voucherElements.getElementsByTagName("BILLDATE").item(0).getTextContent());
            System.out.println(voucherElements.getElementsByTagName("BILLREF").item(0).getTextContent());
            System.out.println(voucherElements.getElementsByTagName("BILLPARTY").item(0).getTextContent());
            System.out.println(voucherElements.getElementsByTagName("BILLFINAL").item(0).getTextContent());
            System.out.println(voucherElements.getElementsByTagName("BILLOVERDUE").item(0).getTextContent());
        }
        
        System.out.println(voucherElement.getElementsByTagName("BILLCL").item(0).getTextContent());
        System.out.println(voucherElement.getElementsByTagName("BILLPDC").item(0).getTextContent());
        System.out.println(voucherElement.getElementsByTagName("BILLFINAL").item(0).getTextContent());
        System.out.println(voucherElement.getElementsByTagName("BILLDUE").item(0).getTextContent());
        System.out.println(voucherElement.getElementsByTagName("BILLOVERDUE").item(0).getTextContent());
    }
    
} catch (Exception e) {
    e.printStackTrace();
}

如果有任何问题,请随时问我。

英文:

I am able to read many type of xml file in java. but today i got a xml file and not able to read its details.

&lt;ENVELOPE&gt;
&lt;BILLFIXED&gt;
&lt;BILLDATE&gt;1-Jul-2017&lt;/BILLDATE&gt;
&lt;BILLREF&gt;1&lt;/BILLREF&gt;
&lt;BILLPARTY&gt;Party1&lt;/BILLPARTY&gt;
&lt;/BILLFIXED&gt;
&lt;BILLCL&gt;-10800.00&lt;/BILLCL&gt;
&lt;BILLPDC/&gt;
&lt;BILLFINAL&gt;-10800.00&lt;/BILLFINAL&gt;
&lt;BILLDUE&gt;1-Jul-2017&lt;/BILLDUE&gt;
&lt;BILLOVERDUE&gt;30&lt;/BILLOVERDUE&gt;
&lt;BILLFIXED&gt;
&lt;BILLDATE&gt;1-Jul-2017&lt;/BILLDATE&gt;
&lt;BILLREF&gt;2&lt;/BILLREF&gt;
&lt;BILLPARTY&gt;Party2&lt;/BILLPARTY&gt;
&lt;/BILLFIXED&gt;
&lt;BILLCL&gt;-2000.00&lt;/BILLCL&gt;
&lt;BILLPDC/&gt;
&lt;BILLFINAL&gt;-2000.00&lt;/BILLFINAL&gt;
&lt;BILLDUE&gt;1-Jul-2017&lt;/BILLDUE&gt;
&lt;BILLOVERDUE&gt;30&lt;/BILLOVERDUE&gt;
&lt;BILLFIXED&gt;
&lt;BILLDATE&gt;1-Jul-2017&lt;/BILLDATE&gt;
&lt;BILLREF&gt;3&lt;/BILLREF&gt;
&lt;BILLPARTY&gt;Party3&lt;/BILLPARTY&gt;
&lt;/BILLFIXED&gt;
&lt;BILLCL&gt;-1416.00&lt;/BILLCL&gt;
&lt;BILLPDC/&gt;
&lt;BILLFINAL&gt;-1416.00&lt;/BILLFINAL&gt;
&lt;BILLDUE&gt;31-Jul-2017&lt;/BILLDUE&gt;
&lt;BILLOVERDUE&gt;0&lt;/BILLOVERDUE&gt;
&lt;/ENVELOPE&gt;

I am using this code for read xml file. I am able to read data inside &lt;BILLFIXED&gt; tag but not able to read data outside of this like <BILLFINAL&gt; and &lt;BILLDUE&gt; etc.

try {
File fXmlFile = new File(&quot;filepath&quot;);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
NodeList billNodeList = doc.getElementsByTagName(&quot;ENVELOPE&quot;);
for(int i=0;i&lt;billNodeList.getLength();i++){
Node voucherNode = billNodeList.item(i);
Element voucherElement = (Element) voucherNode;
NodeList nList = voucherElement.getElementsByTagName(&quot;BILLFIXED&quot;);
for (int temp = 0; temp &lt; nList.getLength(); temp++) {
Node insideNode = nList.item(temp);
Element voucherElements = (Element) insideNode;
System.out.println(voucherElements.getElementsByTagName(&quot;BILLDATE&quot;).item(0).getTextContent());
System.out.println(voucherElements.getElementsByTagName(&quot;BILLREF&quot;).item(0).getTextContent());
System.out.println(voucherElements.getElementsByTagName(&quot;BILLPARTY&quot;).item(0).getTextContent());
System.out.println(voucherElements.getElementsByTagName(&quot;BILLFINAL&quot;).item(0).getTextContent());
System.out.println(voucherElements.getElementsByTagName(&quot;BILLOVERDUE&quot;).item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}

I am try all possible way which i know that but currently i am not able to find any solution.
If anyone have any solution please share with me.

答案1

得分: 1

一种方法是将 XML 进行“修复”,使其更具有良好的结构,例如以下方式:

// 修复 XML
Element envelopeElem = doc.getDocumentElement();
List&lt;Node&gt; children = new ArrayList&lt;&gt;();
for (Node child = envelopeElem.getFirstChild(); child != null; child = child.getNextSibling())
children.add(child);
Element billElem = null;
for (Node child : children) {
if (child.getNodeType() == Node.ELEMENT_NODE &amp;&amp; "BILLFIXED".equals(child.getNodeName()))
envelopeElem.insertBefore(billElem = doc.createElement("BILL"), child);
if (billElem != null)
billElem.appendChild(child);
}

该代码基本上在遇到“BILLFIXED”元素时,创建一个新的<BILL>元素作为<ENVELOPE>的子元素,然后将所有后续节点移动到<BILL>元素中。

结果是,DOM 树中的 XML 如下所示1,这应该更容易供您处理:

<ENVELOPE>
<BILL>
<BILLFIXED>
<BILLDATE>1-Jul-2017</BILLDATE>
<BILLREF>1</BILLREF>
<BILLPARTY>Party1</BILLPARTY>
</BILLFIXED>
<BILLCL>-10800.00</BILLCL>
<BILLPDC/>
<BILLFINAL>-10800.00</BILLFINAL>
<BILLDUE>1-Jul-2017</BILLDUE>
<BILLOVERDUE>30</BILLOVERDUE>
</BILL>
<BILL>
<BILLFIXED>
<BILLDATE>1-Jul-2017</BILLDATE>
<BILLREF>2</BILLREF>
<BILLPARTY>Party2</BILLPARTY>
</BILLFIXED>
<BILLCL>-2000.00</BILLCL>
<BILLPDC/>
<BILLFINAL>-2000.00</BILLFINAL>
<BILLDUE>1-Jul-2017</BILLDUE>
<BILLOVERDUE>30</BILLOVERDUE>
</BILL>
<BILL>
<BILLFIXED>
<BILLDATE>1-Jul-2017</BILLDATE>
<BILLREF>3</BILLREF>
<BILLPARTY>Party3</BILLPARTY>
</BILLFIXED>
<BILLCL>-1416.00</BILLCL>
<BILLPDC/>
<BILLFINAL>-1416.00</BILLFINAL>
<BILLDUE>31-Jul-2017</BILLDUE>
<BILLOVERDUE>0</BILLOVERDUE>
</BILL>
</ENVELOPE>

1) 为了方便人类阅读,已对 XML 进行了重新格式化,即重新缩进。

英文:

One way to do it, is to "fix" the XML to be more well-structured, e.g. like this:

// Fix the XML
Element envelopeElem = doc.getDocumentElement();
List&lt;Node&gt; children = new ArrayList&lt;&gt;();
for (Node child = envelopeElem.getFirstChild(); child != null; child = child.getNextSibling())
children.add(child);
Element billElem = null;
for (Node child : children) {
if (child.getNodeType() == Node.ELEMENT_NODE &amp;&amp; &quot;BILLFIXED&quot;.equals(child.getNodeName()))
envelopeElem.insertBefore(billElem = doc.createElement(&quot;BILL&quot;), child);
if (billElem != null)
billElem.appendChild(child);
}

The code basically creates a new &lt;BILL&gt; element as a child of &lt;ENVELOPE&gt; whenever it encounters a &lt;BILLFIXED&gt; element, then moves all subsequent nodes into the &lt;BILL&gt; element.

The result is that the XML in the DOM tree looks like this<sup>1</sup>, which should be easier for you to process:

&lt;ENVELOPE&gt;
&lt;BILL&gt;
&lt;BILLFIXED&gt;
&lt;BILLDATE&gt;1-Jul-2017&lt;/BILLDATE&gt;
&lt;BILLREF&gt;1&lt;/BILLREF&gt;
&lt;BILLPARTY&gt;Party1&lt;/BILLPARTY&gt;
&lt;/BILLFIXED&gt;
&lt;BILLCL&gt;-10800.00&lt;/BILLCL&gt;
&lt;BILLPDC/&gt;
&lt;BILLFINAL&gt;-10800.00&lt;/BILLFINAL&gt;
&lt;BILLDUE&gt;1-Jul-2017&lt;/BILLDUE&gt;
&lt;BILLOVERDUE&gt;30&lt;/BILLOVERDUE&gt;
&lt;/BILL&gt;
&lt;BILL&gt;
&lt;BILLFIXED&gt;
&lt;BILLDATE&gt;1-Jul-2017&lt;/BILLDATE&gt;
&lt;BILLREF&gt;2&lt;/BILLREF&gt;
&lt;BILLPARTY&gt;Party2&lt;/BILLPARTY&gt;
&lt;/BILLFIXED&gt;
&lt;BILLCL&gt;-2000.00&lt;/BILLCL&gt;
&lt;BILLPDC/&gt;
&lt;BILLFINAL&gt;-2000.00&lt;/BILLFINAL&gt;
&lt;BILLDUE&gt;1-Jul-2017&lt;/BILLDUE&gt;
&lt;BILLOVERDUE&gt;30&lt;/BILLOVERDUE&gt;
&lt;/BILL&gt;
&lt;BILL&gt;
&lt;BILLFIXED&gt;
&lt;BILLDATE&gt;1-Jul-2017&lt;/BILLDATE&gt;
&lt;BILLREF&gt;3&lt;/BILLREF&gt;
&lt;BILLPARTY&gt;Party3&lt;/BILLPARTY&gt;
&lt;/BILLFIXED&gt;
&lt;BILLCL&gt;-1416.00&lt;/BILLCL&gt;
&lt;BILLPDC/&gt;
&lt;BILLFINAL&gt;-1416.00&lt;/BILLFINAL&gt;
&lt;BILLDUE&gt;31-Jul-2017&lt;/BILLDUE&gt;
&lt;BILLOVERDUE&gt;0&lt;/BILLOVERDUE&gt;
&lt;/BILL&gt;
&lt;/ENVELOPE&gt;

<sup>1) The XML has been reformatted for human readability, i.e. it has been re-indented.</sup>

答案2

得分: 0

这不是结构良好的XML。在您的<envelope>标签内部,没有任何内容来表示构成一个'bill'的六个属性集合的开始。通常每个属性集合都应该有一个包含它们的<bill></bill>标签。这会让解析器混淆...

英文:

It isn't well-structured XML. Inside your &lt;envelope&gt; tags there is nothing to indicate the start of each set of six attributes that constitute a 'bill'. You'd normally expect that each one would have a &lt;bill&gt; and &lt;/bill&gt; tag to contain them. And this is going to confuse the parser...

答案3

得分: 0

根据示例 XML,它包含了3条记录的数据。但是每条记录之间没有任何分隔。看起来每个字段的数据都填充到了 XML 标签中并写入文件。

我建议有两个可能的选项:

  1. 基于 JAVA:正如 Andreas 所建议的,读取文件内容并为每条记录添加一个根标签,这将产生有限的 XML 结构,从而更容易处理。当输入文件很大时可能会对性能产生影响。
  2. 基于转换:尝试使用 STX 转换,它可以将结构转换为所需的格式,无论是 XML 还是平面文件。然后处理将会更简单。
英文:

As per sample XML, it has data for 3 records. But each record does not have any separation. Looks like each field data populated into XML tag and written into file.

There 2 possible option I would suggest

  1. JAVA based : As Andreas suggested, Read the file content and add a root tag for each record which would give finite XML structure then would be easier to handle. Performance impact may raise when the input file is in large size.
  2. Transformation based : Try STX transformation which would convert the structure to required format either XML or even flat file. Then processing would be simpler

huangapple
  • 本文由 发表于 2020年9月9日 19:02:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/63810257.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定