英文:
XPATH inside IMPORTXML: how to find text in all languages?
问题
我试图在某个网页上找到对Google Sheets的单元格值的所有提及。单元格值可以用任何语言编写。网页也可以是任何语言。
现在我有这个公式:
=COUNTA(IFERROR(IMPORTXML(A2;"//*[contains(translate(text(),'ABCDEFGHJIKLMNOPQRSTUVWXYZАБВГДЕЁЖЗИКЛМНОПРСТУФХЦЧШЩЭЮЯ', 'abcdefghjiklmnopqrstuvwxyzабвгдеёжзиклмнопрстуфхцчшщэюя'),'"&LOWER(B2)&"')]")))
但它工作不正确。它无法找到拉丁字母和西里尔字母中的所有单词,也没有考虑葡萄牙语、德语和其他语言的字符。如何使这个公式通用?或者如何编写适用于Google Sheets的适当脚本?
我的公式工作不正确。我想要使用Google Sheets的公式或脚本在网页上找到任何语言的任何文本。
英文:
I'm trying to find all mentions of a Google Sheets' cell value on some website page. The cell value can be written in any language. The website page can also be in any language.
Now I have this formula:
=COUNTA(IFERROR(IMPORTXML(A2;"//*[contains(translate(text(),'ABCDEFGHJIKLMNOPQRSTUVWXYZАБВГДЕЁЖЗИКЛМНОПРСТУФХЦЧШЩЭЮЯ', 'abcdefghjiklmnopqrstuvwxyzабвгдеёжзиклмнопрстуфхцчшщэюя'),'"&LOWER(B2)&"')]")))
But it works incorrectly. It's not possible to find all words in Latin and Cyrillic, characters from Portuguese, German and other languages are not taken into account. How to make the formula universal? Or how can I write the appropriate script for Google Sheets?
My formula works incorrectly. I want to find any text in any language on website page with Google Sheets' formula or script.
答案1
得分: 0
我们可以更加优雅地解决这个问题,如果Google Docs支持XPath 2.0,并且我们可以使用fn:lower-case()
。在XPath 1.0中,我们被限制使用translate()
,需要自己提供大写和小写的转换。然而,这种方式很难涵盖所有可能的(Unicode)字符和所有语言。<sup>1</sup> 我已经将拉丁字母的重音符号添加到您的公式中,这应该涵盖大多数西方语言。
=COUNTA(IFERROR(IMPORTXML(A2;"//*[contains(translate(text(),
'ABCDEFGHJIKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞŸŽŠŒАБВГДЕЁЖЗИКЛМНОПРСТУФХЦЧШЩЭЮЯ',
'abcdefghjiklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿžšœабвгдеёжзиклмнопрстуфхцчшщэюя'))&LOWER(B2)&"')]"))
<hr>
<sup>1</sup>: 此类转换对于所有语言都不完美,例如,德语的 'ß'(小写)在大写时可能写作 'SS',这使得查找所有匹配变得困难。
英文:
We could solve this more elegantly if Google Docs would support Xpath 2.0 and we could use the fn:lower-case()
. In Xpath 1.0 we are stuck with translate()
and need to provide the upper-case and lower-case translation ourselves. However, it is difficult to cover all possible (Unicode) characters in all languages this way.<sup>1</sup> I have added the diacritics from the Latin alphabet to your formula, with should cover most Western languages.
=COUNTA(IFERROR(IMPORTXML(A2;"//*[contains(translate(text(),
'ABCDEFGHJIKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞŸŽŠŒАБВГДЕЁЖЗИКЛМНОПРСТУФХЦЧШЩЭЮЯ',
'abcdefghjiklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿžšœабвгдеёжзиклмнопрстуфхцчшщэюя'),'"&LOWER(B2)&"')]")))
<hr>
<sup>1</sup>: Furthermore, such a transformation does not work perfectly for all languages, e.g., the German 'ß' (lowercase) may be written as 'SS' when in uppercase which makes it difficult to find all matches.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论