使用BeautifulSoup获取强调标签后的文本。

huangapple go评论65阅读模式
英文:

Using BeautifulSoup to get the text after a strong tag

问题

我正在尝试从篮球参考页面上抓取指定比赛的裁判信息,并稍后导出。为了测试单个比赛,我尝试了以下代码(以及一些其他变体),但收到了错误消息。

data = requests.get("https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
soup = BeautifulSoup(data.text)
refs = soup.find(string="Officials: ").next_sibling
print(refs)
AttributeError                            Traceback (most recent call last)
Cell In[30], line 3
      1 data = requests.get("https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
      2 soup = BeautifulSoup(data.text)
----> 3 refs = soup.find(string="Officials: ").next_sibling

AttributeError: 'NoneType' object has no attribute 'next_sibling'
英文:

I am trying to scrape from a basketball reference page to pull out referees assigned to certain games and export that later on. To test out one game I tried the below code (and some other variations) but received an error.

data = requests. Get(f"https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
soup = BeautifulSoup(data.text)
refs = soup.find(string = "Officials: ").next_sibling
print(refs)
AttributeError                            Traceback (most recent call last)
Cell In[30], line 3
      1 #data = requests.get(f"https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
      2 soup = BeautifulSoup(data.text)
----> 3 refs = soup.find(string = "Officials: ").next_sibling

AttributeError: 'NoneType' object has no attribute 'next_sibling'

</details>


# 答案1
**得分**: 0

以下是翻译好的内容:

我查看了您提到的页面上的HTML,您的部分看起来是这样的:

&lt;div&gt;&lt;strong&gt;官员:&amp;nbsp;&lt;/strong&gt;达里尔·汉弗莱,唐·哈德森,迈克尔·普莱斯&lt;/div&gt;

其中的 `&amp;nbsp;` 是不间断空格,因此您的*查找*表达式应为:

soup.find(string = &quot;官员:\xa0&quot;)

但是,这将找到文本,而您想要的是文本的父元素,即 `&lt;strong&gt;` 标签,并获取该父标签的*next_sibling*,例如:

soup.find(string = &quot;官员:\xa0&quot;).parent.next_sibling

<details>
<summary>英文:</summary>

I had a look at the HTML on the page you referenced, and your piece looks like this:

    &lt;div&gt;&lt;strong&gt;Officials:&amp;nbsp;&lt;/strong&gt;Daryl Humphrey, Don Hudson, Michael Price&lt;/div&gt;

The `&amp;nbsp;` in there is a non-breaking space, so your *find* expression needs to be:

    soup.find(string = &quot;Officials:\xa0&quot;)

However, that&#39;s going to find the text, whereas what you want is the parent of the text, i.e. the `&lt;strong&gt;` tag, and to get the *next_sibling* of that parent tag, for example:

    soup.find(string = &quot;Officials:\xa0&quot;).parent.next_sibling




</details>



huangapple
  • 本文由 发表于 2023年4月17日 22:18:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76036140.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定