2023年6月1日 12:31:33go评论94阅读模式

英文:

R - How to rename xml parent node based on child element (or associate child elements)?

问题

我正在尝试从XML文件中提取一些元素到一个R数据框中，但父节点都有相同的名称，所以我不知道如何关联子元素。我对XML非常新（大约3小时），所以如果我使用了错误的术语，请原谅。我没有找到任何基于R的解决方案。

这是XML文件的一般结构：

<Annotations>
    <Version>1.0.0.0</Version>
    <Annotation>
        <MicronLength>14.1593438418</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>1</ObjIndex>
    </Annotation>
    <Annotation>
        <MicronLength>5.7578076896</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>2</ObjIndex>
    </Annotation>
</Annotations>

有许多 "Annotation" 节点。还有其他几个子节点名称，但它们不重要，因为我只想提取 MicronLength 和 ObjIndex 到一个数据框中。所以我需要：

关联并从每个 "Annotation" 节点内获取这两个元素

或者

基于内部的 ObjIndex 重命名每个 "Annotation"（例如 "Annotation 1"，"Annotation 2" 等），然后将父名称和子元素放入数据框中。

我还有几个XML文件，所以我想迭代每个文件，最终创建一个类似下面示例的数据框。

| filename           | ObjIndex | MicronLength  |
| ------------------ | -------- | ------------- |
| examplefile1(.xml) | 1        | 14.1593438418 |
| examplefile1       | 2        | 5.7578076896  |
| examplefile2       | 1        | 12.6345661343 |

文件名（带或不带扩展名）然后将分割成更多列，但我可以自己处理。

非常感谢！

英文:

I'm trying to extract a couple of elements from XML files into an R dataframe but the parent nodes are all named the same, so I don't know how to associate child elements. I'm very new to xml (about 3 hours) so apologies if I use the wrong terminology. I did not find any R-based solutions.

This is the general structure of the xml files:

&lt;Annotations&gt;
    &lt;Version&gt;1.0.0.0&lt;/Version&gt;
    &lt;Annotation&gt;
        &lt;MicronLength&gt;14.1593438418&lt;/MicronLength&gt;
        &lt;MicronHeight&gt;0.0000000000&lt;/MicronHeight&gt;
        &lt;ObjIndex&gt;1&lt;/ObjIndex&gt;
    &lt;/Annotation&gt;
    &lt;Annotation&gt;
        &lt;MicronLength&gt;5.7578076896&lt;/MicronLength&gt;
        &lt;MicronHeight&gt;0.0000000000&lt;/MicronHeight&gt;
        &lt;ObjIndex&gt;2&lt;/ObjIndex&gt;
    &lt;/Annotation&gt;
&lt;/Annotations&gt;

There are many "Annotation" nodes. There are also several other children node names in there but they don't matter as I'm just trying to extract MicronLength and ObjIndex into a dataframe. So I need to either:

Associate and get both elements from within each "Annotation" node

Rename each "Annotation" based on the ObjIndex within (e.g. "Annotation 1", "Annotation 2", etc.) and then get parent name and child element into the df.

I also have several xml files so I want to iterate over each one to eventually create a DF like the example below.

| filename           | ObjIndex | MicronLength  |
| ------------------ | -------- | ------------- |
| examplefile1(.xml) | 1        | 14.1593438418 |
| examplefile1       | 2        | 5.7578076896  |
| examplefile2       | 1        | 12.6345661343 |

The filenames (with or without extension) will then be str_split into some more columns but I can do that myself.

Much appreciated!

答案1

得分: 1

我之前使用过 xml_find_all() 来进行这种简单的转换。只要每个 Annotation 节点始终都有一个 ObjIndex 和 MicronLength 子节点，这就可以正常工作：

library(xml2)
xml <- read_xml("
<Annotations>
    <Version>1.0.0.0</Version>
    <Annotation>
        <MicronLength>14.1593438418</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>1</ObjIndex>
    </Annotation>
    <Annotation>
        <MicronLength>5.7578076896</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>2</ObjIndex>
    </Annotation>
</Annotations>
")
data.frame(
  ObjIndex = xml_integer(xml_find_all(xml, "Annotation/ObjIndex")),
  MicronLength = xml_double(xml_find_all(xml, "Annotation/MicronLength"))
)
#>   ObjIndex MicronLength
#> 1        1    14.159344
#> 2        2     5.757808

英文:

I have previously used xml_find_all() for this kind of simple conversion.
This works as long as each Annotation node always has exactly
one ObjIndex and MicronLength child node:

library(xml2)
xml &lt;- read_xml(&quot;
&lt;Annotations&gt;
    &lt;Version&gt;1.0.0.0&lt;/Version&gt;
    &lt;Annotation&gt;
        &lt;MicronLength&gt;14.1593438418&lt;/MicronLength&gt;
        &lt;MicronHeight&gt;0.0000000000&lt;/MicronHeight&gt;
        &lt;ObjIndex&gt;1&lt;/ObjIndex&gt;
    &lt;/Annotation&gt;
    &lt;Annotation&gt;
        &lt;MicronLength&gt;5.7578076896&lt;/MicronLength&gt;
        &lt;MicronHeight&gt;0.0000000000&lt;/MicronHeight&gt;
        &lt;ObjIndex&gt;2&lt;/ObjIndex&gt;
    &lt;/Annotation&gt;
&lt;/Annotations&gt;
&quot;)
data.frame(
  ObjIndex = xml_integer(xml_find_all(xml, &quot;Annotation/ObjIndex&quot;)),
  MicronLength = xml_double(xml_find_all(xml, &quot;Annotation/MicronLength&quot;))
)
#&gt;   ObjIndex MicronLength
#&gt; 1        1    14.159344
#&gt; 2        2     5.757808

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R – 如何根据子元素（或关联子元素）重命名xml父节点？

问题

答案1

线性回归模型在R中计算出现错误。

Logistic Regression中只有一个列的固定系数

How can I extract only USD values from a column in R data table including salaries in crore?

使用Microsoft365R响应Teams消息

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。