英文:
Using BeautifulSoup to get the text after a strong tag
问题
我正在尝试从篮球参考页面上抓取指定比赛的裁判信息,并稍后导出。为了测试单个比赛,我尝试了以下代码(以及一些其他变体),但收到了错误消息。
data = requests.get("https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
soup = BeautifulSoup(data.text)
refs = soup.find(string="Officials: ").next_sibling
print(refs)
AttributeError Traceback (most recent call last)
Cell In[30], line 3
1 data = requests.get("https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
2 soup = BeautifulSoup(data.text)
----> 3 refs = soup.find(string="Officials: ").next_sibling
AttributeError: 'NoneType' object has no attribute 'next_sibling'
英文:
I am trying to scrape from a basketball reference page to pull out referees assigned to certain games and export that later on. To test out one game I tried the below code (and some other variations) but received an error.
data = requests. Get(f"https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
soup = BeautifulSoup(data.text)
refs = soup.find(string = "Officials: ").next_sibling
print(refs)
AttributeError Traceback (most recent call last)
Cell In[30], line 3
1 #data = requests.get(f"https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
2 soup = BeautifulSoup(data.text)
----> 3 refs = soup.find(string = "Officials: ").next_sibling
AttributeError: 'NoneType' object has no attribute 'next_sibling'
</details>
# 答案1
**得分**: 0
以下是翻译好的内容:
我查看了您提到的页面上的HTML,您的部分看起来是这样的:
<div><strong>官员:&nbsp;</strong>达里尔·汉弗莱,唐·哈德森,迈克尔·普莱斯</div>
其中的 `&nbsp;` 是不间断空格,因此您的*查找*表达式应为:
soup.find(string = "官员:\xa0")
但是,这将找到文本,而您想要的是文本的父元素,即 `<strong>` 标签,并获取该父标签的*next_sibling*,例如:
soup.find(string = "官员:\xa0").parent.next_sibling
<details>
<summary>英文:</summary>
I had a look at the HTML on the page you referenced, and your piece looks like this:
<div><strong>Officials:&nbsp;</strong>Daryl Humphrey, Don Hudson, Michael Price</div>
The `&nbsp;` in there is a non-breaking space, so your *find* expression needs to be:
soup.find(string = "Officials:\xa0")
However, that's going to find the text, whereas what you want is the parent of the text, i.e. the `<strong>` tag, and to get the *next_sibling* of that parent tag, for example:
soup.find(string = "Officials:\xa0").parent.next_sibling
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论