问题

我正在尝试从篮球参考页面上抓取指定比赛的裁判信息，并稍后导出。为了测试单个比赛，我尝试了以下代码（以及一些其他变体），但收到了错误消息。

data = requests.get("https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
soup = BeautifulSoup(data.text)
refs = soup.find(string="Officials: ").next_sibling
print(refs)

AttributeError                            Traceback (most recent call last)
Cell In[30], line 3
      1 data = requests.get("https://www.basketball-reference.com/wnba/boxscores/201506050CON.html")
      2 soup = BeautifulSoup(data.text)
----> 3 refs = soup.find(string="Officials: ").next_sibling

AttributeError: 'NoneType' object has no attribute 'next_sibling'

英文:

I am trying to scrape from a basketball reference page to pull out referees assigned to certain games and export that later on. To test out one game I tried the below code (and some other variations) but received an error.

data = requests. Get(f&quot;https://www.basketball-reference.com/wnba/boxscores/201506050CON.html&quot;)
soup = BeautifulSoup(data.text)
refs = soup.find(string = &quot;Officials: &quot;).next_sibling
print(refs)

AttributeError                            Traceback (most recent call last)
Cell In[30], line 3
      1 #data = requests.get(f&quot;https://www.basketball-reference.com/wnba/boxscores/201506050CON.html&quot;)
      2 soup = BeautifulSoup(data.text)
----&gt; 3 refs = soup.find(string = &quot;Officials: &quot;).next_sibling

AttributeError: &#39;NoneType&#39; object has no attribute &#39;next_sibling&#39;

</details>


# 答案1
**得分**: 0

以下是翻译好的内容：

我查看了您提到的页面上的HTML，您的部分看起来是这样的：

&lt;div&gt;&lt;strong&gt;官员：&amp;nbsp;&lt;/strong&gt;达里尔·汉弗莱，唐·哈德森，迈克尔·普莱斯&lt;/div&gt;

其中的 `&amp;nbsp;` 是不间断空格，因此您的*查找*表达式应为：

soup.find(string = &quot;官员：\xa0&quot;)

但是，这将找到文本，而您想要的是文本的父元素，即 `&lt;strong&gt;` 标签，并获取该父标签的*next_sibling*，例如：

soup.find(string = &quot;官员：\xa0&quot;).parent.next_sibling

<details>
<summary>英文:</summary>

I had a look at the HTML on the page you referenced, and your piece looks like this:

    &lt;div&gt;&lt;strong&gt;Officials:&amp;nbsp;&lt;/strong&gt;Daryl Humphrey, Don Hudson, Michael Price&lt;/div&gt;

The `&amp;nbsp;` in there is a non-breaking space, so your *find* expression needs to be:

    soup.find(string = &quot;Officials:\xa0&quot;)

However, that&#39;s going to find the text, whereas what you want is the parent of the text, i.e. the `&lt;strong&gt;` tag, and to get the *next_sibling* of that parent tag, for example:

    soup.find(string = &quot;Officials:\xa0&quot;).parent.next_sibling




</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用BeautifulSoup获取强调标签后的文本。

问题

程序在使用py2exe转换后立即关闭。

无法使用Selenium Webdriver。出现两个异常。

优先考虑非线性系统中的方程。

使用any()创建多个条目的列表理解在Pandas中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论