英文:
R - How to rename xml parent node based on child element (or associate child elements)?
问题
我正在尝试从XML文件中提取一些元素到一个R数据框中,但父节点都有相同的名称,所以我不知道如何关联子元素。我对XML非常新(大约3小时),所以如果我使用了错误的术语,请原谅。我没有找到任何基于R的解决方案。
这是XML文件的一般结构:
<Annotations>
<Version>1.0.0.0</Version>
<Annotation>
<MicronLength>14.1593438418</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>1</ObjIndex>
</Annotation>
<Annotation>
<MicronLength>5.7578076896</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>2</ObjIndex>
</Annotation>
</Annotations>
有许多 "Annotation" 节点。还有其他几个子节点名称,但它们不重要,因为我只想提取 MicronLength 和 ObjIndex 到一个数据框中。所以我需要:
- 关联并从每个 "Annotation" 节点内获取这两个元素
或者
- 基于内部的 ObjIndex 重命名每个 "Annotation"(例如 "Annotation 1","Annotation 2" 等),然后将父名称和子元素放入数据框中。
我还有几个XML文件,所以我想迭代每个文件,最终创建一个类似下面示例的数据框。
| filename | ObjIndex | MicronLength |
| ------------------ | -------- | ------------- |
| examplefile1(.xml) | 1 | 14.1593438418 |
| examplefile1 | 2 | 5.7578076896 |
| examplefile2 | 1 | 12.6345661343 |
文件名(带或不带扩展名)然后将分割成更多列,但我可以自己处理。
非常感谢!
英文:
I'm trying to extract a couple of elements from XML files into an R dataframe but the parent nodes are all named the same, so I don't know how to associate child elements. I'm very new to xml (about 3 hours) so apologies if I use the wrong terminology. I did not find any R-based solutions.
This is the general structure of the xml files:
<Annotations>
<Version>1.0.0.0</Version>
<Annotation>
<MicronLength>14.1593438418</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>1</ObjIndex>
</Annotation>
<Annotation>
<MicronLength>5.7578076896</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>2</ObjIndex>
</Annotation>
</Annotations>
There are many "Annotation" nodes. There are also several other children node names in there but they don't matter as I'm just trying to extract MicronLength and ObjIndex into a dataframe. So I need to either:
- Associate and get both elements from within each "Annotation" node
OR
- Rename each "Annotation" based on the ObjIndex within (e.g. "Annotation 1", "Annotation 2", etc.) and then get parent name and child element into the df.
I also have several xml files so I want to iterate over each one to eventually create a DF like the example below.
| filename | ObjIndex | MicronLength |
| ------------------ | -------- | ------------- |
| examplefile1(.xml) | 1 | 14.1593438418 |
| examplefile1 | 2 | 5.7578076896 |
| examplefile2 | 1 | 12.6345661343 |
The filenames (with or without extension) will then be str_split into some more columns but I can do that myself.
Much appreciated!
答案1
得分: 1
我之前使用过 xml_find_all()
来进行这种简单的转换。只要每个 Annotation
节点始终都有一个 ObjIndex
和 MicronLength
子节点,这就可以正常工作:
library(xml2)
xml <- read_xml("
<Annotations>
<Version>1.0.0.0</Version>
<Annotation>
<MicronLength>14.1593438418</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>1</ObjIndex>
</Annotation>
<Annotation>
<MicronLength>5.7578076896</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>2</ObjIndex>
</Annotation>
</Annotations>
")
data.frame(
ObjIndex = xml_integer(xml_find_all(xml, "Annotation/ObjIndex")),
MicronLength = xml_double(xml_find_all(xml, "Annotation/MicronLength"))
)
#> ObjIndex MicronLength
#> 1 1 14.159344
#> 2 2 5.757808
英文:
I have previously used xml_find_all()
for this kind of simple conversion.
This works as long as each Annotation
node always has exactly
one ObjIndex
and MicronLength
child node:
library(xml2)
xml <- read_xml("
<Annotations>
<Version>1.0.0.0</Version>
<Annotation>
<MicronLength>14.1593438418</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>1</ObjIndex>
</Annotation>
<Annotation>
<MicronLength>5.7578076896</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>2</ObjIndex>
</Annotation>
</Annotations>
")
data.frame(
ObjIndex = xml_integer(xml_find_all(xml, "Annotation/ObjIndex")),
MicronLength = xml_double(xml_find_all(xml, "Annotation/MicronLength"))
)
#> ObjIndex MicronLength
#> 1 1 14.159344
#> 2 2 5.757808
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论