R – 如何根据子元素(或关联子元素)重命名xml父节点?

huangapple go评论65阅读模式
英文:

R - How to rename xml parent node based on child element (or associate child elements)?

问题

我正在尝试从XML文件中提取一些元素到一个R数据框中,但父节点都有相同的名称,所以我不知道如何关联子元素。我对XML非常新(大约3小时),所以如果我使用了错误的术语,请原谅。我没有找到任何基于R的解决方案。

这是XML文件的一般结构:

<Annotations>
    <Version>1.0.0.0</Version>
    <Annotation>
        <MicronLength>14.1593438418</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>1</ObjIndex>
    </Annotation>
    <Annotation>
        <MicronLength>5.7578076896</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>2</ObjIndex>
    </Annotation>
</Annotations>

有许多 "Annotation" 节点。还有其他几个子节点名称,但它们不重要,因为我只想提取 MicronLength 和 ObjIndex 到一个数据框中。所以我需要:

  1. 关联并从每个 "Annotation" 节点内获取这两个元素

或者

  1. 基于内部的 ObjIndex 重命名每个 "Annotation"(例如 "Annotation 1","Annotation 2" 等),然后将父名称和子元素放入数据框中。

我还有几个XML文件,所以我想迭代每个文件,最终创建一个类似下面示例的数据框。

| filename           | ObjIndex | MicronLength  |

| ------------------ | -------- | ------------- |

| examplefile1(.xml) | 1        | 14.1593438418 |

| examplefile1       | 2        | 5.7578076896  |

| examplefile2       | 1        | 12.6345661343 |

文件名(带或不带扩展名)然后将分割成更多列,但我可以自己处理。

非常感谢!

英文:

I'm trying to extract a couple of elements from XML files into an R dataframe but the parent nodes are all named the same, so I don't know how to associate child elements. I'm very new to xml (about 3 hours) so apologies if I use the wrong terminology. I did not find any R-based solutions.

This is the general structure of the xml files:

&lt;Annotations&gt;
    &lt;Version&gt;1.0.0.0&lt;/Version&gt;
    &lt;Annotation&gt;
        &lt;MicronLength&gt;14.1593438418&lt;/MicronLength&gt;
        &lt;MicronHeight&gt;0.0000000000&lt;/MicronHeight&gt;
        &lt;ObjIndex&gt;1&lt;/ObjIndex&gt;
    &lt;/Annotation&gt;
    &lt;Annotation&gt;
        &lt;MicronLength&gt;5.7578076896&lt;/MicronLength&gt;
        &lt;MicronHeight&gt;0.0000000000&lt;/MicronHeight&gt;
        &lt;ObjIndex&gt;2&lt;/ObjIndex&gt;
    &lt;/Annotation&gt;
&lt;/Annotations&gt;

There are many "Annotation" nodes. There are also several other children node names in there but they don't matter as I'm just trying to extract MicronLength and ObjIndex into a dataframe. So I need to either:

  1. Associate and get both elements from within each "Annotation" node

OR

  1. Rename each "Annotation" based on the ObjIndex within (e.g. "Annotation 1", "Annotation 2", etc.) and then get parent name and child element into the df.

I also have several xml files so I want to iterate over each one to eventually create a DF like the example below.

| filename           | ObjIndex | MicronLength  |

| ------------------ | -------- | ------------- |

| examplefile1(.xml) | 1        | 14.1593438418 |

| examplefile1       | 2        | 5.7578076896  |

| examplefile2       | 1        | 12.6345661343 |

The filenames (with or without extension) will then be str_split into some more columns but I can do that myself.

Much appreciated!

答案1

得分: 1

我之前使用过 xml_find_all() 来进行这种简单的转换。只要每个 Annotation 节点始终都有一个 ObjIndexMicronLength 子节点,这就可以正常工作:

library(xml2)

xml <- read_xml("
<Annotations>
    <Version>1.0.0.0</Version>
    <Annotation>
        <MicronLength>14.1593438418</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>1</ObjIndex>
    </Annotation>
    <Annotation>
        <MicronLength>5.7578076896</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>2</ObjIndex>
    </Annotation>
</Annotations>
")

data.frame(
  ObjIndex = xml_integer(xml_find_all(xml, "Annotation/ObjIndex")),
  MicronLength = xml_double(xml_find_all(xml, "Annotation/MicronLength"))
)
#>   ObjIndex MicronLength
#> 1        1    14.159344
#> 2        2     5.757808
英文:

I have previously used xml_find_all() for this kind of simple conversion.
This works as long as each Annotation node always has exactly
one ObjIndex and MicronLength child node:

library(xml2)

xml &lt;- read_xml(&quot;
&lt;Annotations&gt;
    &lt;Version&gt;1.0.0.0&lt;/Version&gt;
    &lt;Annotation&gt;
        &lt;MicronLength&gt;14.1593438418&lt;/MicronLength&gt;
        &lt;MicronHeight&gt;0.0000000000&lt;/MicronHeight&gt;
        &lt;ObjIndex&gt;1&lt;/ObjIndex&gt;
    &lt;/Annotation&gt;
    &lt;Annotation&gt;
        &lt;MicronLength&gt;5.7578076896&lt;/MicronLength&gt;
        &lt;MicronHeight&gt;0.0000000000&lt;/MicronHeight&gt;
        &lt;ObjIndex&gt;2&lt;/ObjIndex&gt;
    &lt;/Annotation&gt;
&lt;/Annotations&gt;
&quot;)

data.frame(
  ObjIndex = xml_integer(xml_find_all(xml, &quot;Annotation/ObjIndex&quot;)),
  MicronLength = xml_double(xml_find_all(xml, &quot;Annotation/MicronLength&quot;))
)
#&gt;   ObjIndex MicronLength
#&gt; 1        1    14.159344
#&gt; 2        2     5.757808

huangapple
  • 本文由 发表于 2023年6月1日 12:31:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76378680.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定