Beautiful Soup – 在提供 `string` 给 `find()` 方法时忽略 `<span>`

huangapple go评论72阅读模式
英文:

Beautiful Soup - ignore `<span>` while providing `string` to `find()` method

问题

以下是您要翻译的内容:

我正在使用BeautifulSoup4在Python中解析一些文本。

地址块以类似这样的单元格开头:

&lt;td&gt;&lt;strong&gt;Address&lt;/strong&gt;&lt;/td&gt;

我使用soup.find(&quot;td&quot;, &quot;Address&quot;)来查找上述单元格。

但现在,有些地址还有一个突出显示的字符,类似这样:

&lt;td&gt;&lt;strong&gt;&lt;span&gt;*&lt;/span&gt;Address&lt;/strong&gt;&lt;/td&gt;

这破坏了我的匹配。是否仍然有办法找到这个TR?

英文:

I am parsing some text in Python, using BeautifulSoup4.

The address block starts with a cell like this:

&lt;td&gt;&lt;strong&gt;Address&lt;/strong&gt;&lt;/td&gt;

I find the above cell using soup.find(&quot;td&quot;, &quot;Address&quot;)

But, now some addresses have a highlight character too, like this:

&lt;td&gt;&lt;strong&gt;&lt;span&gt;*&lt;/span&gt;Address&lt;/strong&gt;&lt;/td&gt;

This has broken my matching. Is there still a way to find this TR?

答案1

得分: 1

你可以尝试使用CSS选择器或re如下:

soup.select('td:has(strong:contains("Address"))')

或者

import re
soup.find("td", text=re.compile("Address"))
英文:

You can try using either CSS selector or re as follows:

soup.select(&#39;td:has(strong:contains(&quot;Address&quot;))&#39;)

OR

import re
soup.find(&quot;td&quot;, text=re.compile(&quot;Address&quot;))

答案2

得分: 0

我得到的解决方案如下:

strong_blocks = soup.find_all("strong")
def common_block(tag):
    return tag.find(string="Address", recursive=False)
address_texts = list(filter(common_block, strong_blocks))
if len(address_texts) == 1:
    address_text = address_texts[0]
    address_cell = address_text.parent

这个“技巧”是,一旦我有了<strong>元素的列表,我可以使用recursive=False来防止<span>被检查。

英文:

I ended up with a solution like this:

    strong_blocks = soup.find_all(&quot;strong&quot;)
    def common_block(tag):
        return tag.find(string=&quot;Address&quot;, recursive=False)
    address_texts = list(filter(common_block, strong_blocks))
    if len(address_texts) == 1:
        address_text = address_texts[0]
        address_cell = address_text.parent

The trick was that once I had a list of &lt;strong&gt; elements, I was able to use recursive=False to prevent the &lt;span&gt; being inspected.

huangapple
  • 本文由 发表于 2023年4月7日 00:04:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75951558.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定