如何在Google Sheets的IMPORTXML XPath中使用AND运算符?

huangapple go评论50阅读模式
英文:

How to use AND operator in Google Sheets IMPORTXML XPath?

问题

我正在尝试编写一个XPath查询,用于从特定类型的页面中收集链接。我以为可以使用“AND”运算符(竖线字符),但不太确定如何实现。到目前为止,我有以下内容,但是它是错误的。

=IMPORTXML(B2,"//a[not(starts-with(@href, '/'))]/@href | //a[not(contains(@href, 'example.com'))]/@href")

我的想法是,我想收集除了包含example.com的链接和以斜杠开头的链接之外的所有链接。

令人惊讶的是,它仍然会提取页面上的所有链接,完全忽略了我的指示。

非常感谢任何帮助。

英文:

I'm trying to write an XPath query for Google Sheets to gather links from a specific type of page.
I thought I could use an "AND" operator (pipe character), but can't quite figure out how to do it.
Here's what I've got so far, but it's wrong.

=IMPORTXML(B2,"//a[not(starts-with(@href, '/'))]/@href | //a[not(contains(@href, 'example.com'))]/@href")

The idea is that I want to gather all links except for ones that contain example.com and ones that begin with a forward slash.

The absolutely surprising thing is that it will still extract all links from a page just completely ignoring my instructions.

Any help would be greatly appreciated.

答案1

得分: 2

你错了。在XPath中,| 运算符表示“and”(与)。它的含义是“合并节点集”。所以你是在合并第一个表达式的结果与第二个的结果。
要实现你想要的效果,可以尝试以下方法:

=IMPORTXML(B2, "//a[not(starts-with(@href, '/') or contains(@href, 'example.com'))]/@href")
英文:

You are mistaken. The | operator does not mean "and" in XPath. Its meaning is "merge nodesets". So you were merging the results of the first expression with the results of the second.
To realize what you want, try this approach:

=IMPORTXML(B2,"//a[not(starts-with(@href, '/') or contains(@href, 'example.com'))]/@href")

huangapple
  • 本文由 发表于 2023年5月31日 23:08:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76374946.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定