英文:
From R xml2 library, I don't understand how xml_find_all and xml_find_first work
问题
以下是您要翻译的内容:
I am trying to mimic a simple example to retrieve named nodes with xml_find_first() and xml_find_all() functions. The simple example works very well:
library(xml2)
x <- read_xml("<foo><bar><baz/></bar><baz/></foo>")
xml_find_all(x, ".//baz")
xml_find_all(x, ".//bar")
xml_find_first(x, ".//bar")
As expected, the output for the three cases is:
{xml_nodeset (2)}
[1] <baz/>
[2] <baz/>
{xml_nodeset (1)}
[1] <bar>\n <baz/>\n</bar>
{xml_node}
<bar>
[1] <baz/>
Now, with the more complex, production example, it seems that the two functions behave differently
library(xml2)
yy <- read_xml(
''<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<fileVersion appName="xl" lastEdited="3" lowestEdited="5" rupBuild="9302"/>
<workbookPr/>
<workbookProtection/>
<bookViews>
<workbookView windowWidth="27090" windowHeight="8700" tabRatio="500" activeTab="1"/>
</bookViews>
<sheets>
<sheet name="PARTICIPANTES" sheetId="1" r:id="rId1"/>
<sheet name="ORDENADOS" sheetId="2" r:id="rId2"/>
</sheets>
<calcPr calcId="144525"/>
</workbook>'
)
xml_find_first(yy, ".//sheets")
xml_find_first(yy, "//sheets")
xml_find_all(yy, "//sheets")
In all cases, the answer is a missing node:
{xml_missing}
<NA>
{xml_missing}
<NA>
{xml_nodeset (0)}
Is there something I am missing about these functions?
英文:
I am trying to mimic a simple example to retrieve named nodes with xml_find_first() and xml_find_all() functions. The simple example works very well:
library(xml2)
x <- read_xml("<foo><bar><baz/></bar><baz/></foo>")
xml_find_all(x, ".//baz")
xml_find_all(x, ".//bar")
xml_find_first(x, ".//bar")
As expected, the output for the three cases is:
{xml_nodeset (2)}
[1] <baz/>
[2] <baz/>
{xml_nodeset (1)}
[1] <bar>\n <baz/>\n</bar>
{xml_node}
<bar>
[1] <baz/>
Now, with the more complex, production example, it seems that the two functions behave differently
library(xml2)
yy <- read_xml(
'<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<fileVersion appName="xl" lastEdited="3" lowestEdited="5" rupBuild="9302"/>
<workbookPr/>
<workbookProtection/>
<bookViews>
<workbookView windowWidth="27090" windowHeight="8700" tabRatio="500" activeTab="1"/>
</bookViews>
<sheets>
<sheet name="PARTICIPANTES" sheetId="1" r:id="rId1"/>
<sheet name="ORDENADOS" sheetId="2" r:id="rId2"/>
</sheets>
<calcPr calcId="144525"/>
</workbook>'
)
xml_find_first(yy, ".//sheets")
xml_find_first(yy, "//sheets")
xml_find_all(yy, "//sheets")
In all cases, the answer is a missing node:
{xml_missing}
<NA>
{xml_missing}
<NA>
{xml_nodeset (0)}
Is there something I am missing about these functions?
答案1
得分: 1
请注意使用 xml_ns_rename
来重命名默认命名空间,其标识为 xmlns="..."
,与带前缀的命名空间 xmlns:r="..."
不同。重命名后,您可以在任何 XPath 表达式中使用临时前缀。
ns <- xml_ns_rename(xml_ns(yy), d1 = "doc")
xml_find_first(yy, ".//doc:sheets", ns)
xml_find_first(yy, "//doc:sheets", ns)
xml_find_all(yy, "//doc:sheets", ns)
英文:
Consider xml_ns_rename
to rename the default namespace, identified by xmlns="..."
which differs from prefixed namespace xmlns:r="..."
. Renaming allows you then to use a temporary prefix in any XPath expression.
ns <- xml_ns_rename(xml_ns(yy), d1 = "doc")
xml_find_first(yy, ".//doc:sheets", ns)
xml_find_first(yy, "//doc:sheets", ns)
xml_find_all(yy, "//doc:sheets", ns)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论