2023年4月7日 00:04:00go评论100阅读模式

英文:

Beautiful Soup - ignore `` while providing `string` to `find()` method

问题

以下是您要翻译的内容：

我正在使用BeautifulSoup4在Python中解析一些文本。

地址块以类似这样的单元格开头：

&lt;td&gt;&lt;strong&gt;Address&lt;/strong&gt;&lt;/td&gt;

我使用soup.find("td", "Address")来查找上述单元格。

但现在，有些地址还有一个突出显示的字符，类似这样：

&lt;td&gt;&lt;strong&gt;&lt;span&gt;*&lt;/span&gt;Address&lt;/strong&gt;&lt;/td&gt;

这破坏了我的匹配。是否仍然有办法找到这个TR？

英文:

I am parsing some text in Python, using BeautifulSoup4.

The address block starts with a cell like this:

&lt;td&gt;&lt;strong&gt;Address&lt;/strong&gt;&lt;/td&gt;

I find the above cell using soup.find("td", "Address")

But, now some addresses have a highlight character too, like this:

&lt;td&gt;&lt;strong&gt;&lt;span&gt;*&lt;/span&gt;Address&lt;/strong&gt;&lt;/td&gt;

This has broken my matching. Is there still a way to find this TR?

答案1

得分: 1

你可以尝试使用CSS选择器或re如下：

soup.select('td:has(strong:contains("Address"))')

或者

import re
soup.find("td", text=re.compile("Address"))

英文:

You can try using either CSS selector or re as follows:

soup.select(&#39;td:has(strong:contains(&quot;Address&quot;))&#39;)

import re
soup.find(&quot;td&quot;, text=re.compile(&quot;Address&quot;))

答案2

得分: 0

我得到的解决方案如下：

strong_blocks = soup.find_all("strong")
def common_block(tag):
    return tag.find(string="Address", recursive=False)
address_texts = list(filter(common_block, strong_blocks))
if len(address_texts) == 1:
    address_text = address_texts[0]
    address_cell = address_text.parent

这个“技巧”是，一旦我有了元素的列表，我可以使用recursive=False来防止被检查。

英文:

I ended up with a solution like this:

    strong_blocks = soup.find_all(&quot;strong&quot;)
    def common_block(tag):
        return tag.find(string=&quot;Address&quot;, recursive=False)
    address_texts = list(filter(common_block, strong_blocks))
    if len(address_texts) == 1:
        address_text = address_texts[0]
        address_cell = address_text.parent

The trick was that once I had a list of  elements, I was able to use recursive=False to prevent the  being inspected.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Beautiful Soup – 在提供 `string` 给 `find()` 方法时忽略 `<span>`

问题

答案1

答案2

如何将所有内容保持在 div 内部？

按钮未显示在PYQT QVBoxLayout上。

my while loop is not breaking when the user input reaches the number 10?

Add icon in alert-block in Jupyter notebook.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。