Beautiful Soup爬取时缺少输出文本 – 如何提取它?

huangapple go评论54阅读模式
英文:

Missing text in output when scraping with Beautiful Soup - how do I extract it?

问题

我目前正在进行一个个人项目,对于网页抓取和Beautiful Soups库我还比较新,所以非常感谢任何帮助!我目前正试图从以下HTML片段中提取R1、R2等文本。

以下是我为此编写的代码:

import requests
from bs4 import BeautifulSoup

URL1 = "https://www.sportsbet.com.au/racing-schedule/horse/today"
racing = requests.get(URL1)
soup2 = BeautifulSoup(racing.content, "lxml")

race_index = soup2.findAll('div', {"class":"tableHeaderCell_fh883o"})
for race in race_index:
    print(race)

然而,显然在div标签内有一些文本,但我得到的输出是:

<div class="tableHeaderCell_fh883o"></div>
<div class="tableHeaderCell_fh883o"></div>
<div class="tableHeaderCell_fh883o"></div>

我想知道为什么div标签内的文本丢失了,以及如何提取文本。

英文:

I am currently doing a personal project, and am quite new to web scraping and the Beautiful Soups library, so any help would be much appreciated!
I am currently trying to extract the R1, R2 etc text from the following HTML snippet

Beautiful Soup爬取时缺少输出文本 – 如何提取它?

The code I've written for this is below:

import requests
from bs4 import BeautifulSoup

URL1 = &quot;https://www.sportsbet.com.au/racing-schedule/horse/today&quot;
racing = requests.get(URL1)
soup2 = BeautifulSoup(racing.content, &quot;lxml&quot;)

race_index = soup2.findAll(&#39;div&#39;, {&quot;class&quot;:&quot;tableHeaderCell_fh883o&quot;})
for race in race_index:
    print(race)

However, there is clearly some text within the div tags, but the output I am getting is:

&lt;div class=&quot;tableHeaderCell_fh883o&quot;&gt;&lt;/div&gt;
&lt;div class=&quot;tableHeaderCell_fh883o&quot;&gt;&lt;/div&gt;
&lt;div class=&quot;tableHeaderCell_fh883o&quot;&gt;&lt;/div&gt;

I am wondering why the text within the div tags are missing, and how I can extract the text.

答案1

得分: 0

"是的,你无法获取它,因为这些数据是动态加载的,而不是静态的,所以用BeautifulSoup打开它不会加载这些数据。

相反,如果你在浏览器中打开页面并打开开发者工具,切换到网络选项卡,然后刷新页面,你会发现正在发起的请求

长话短说,只需前往该链接,你将在那里找到所需的数据以JSON格式加载。

请不要忘记将此解决方案标记为答案,如果解决了你的问题。"

英文:

yes you can't get it because this data is dynamically loaded not static so opening it with BeautifulSoup won't load this data.

Instead, if you open the page in your browser and open DevTools, switch to the network tab then refresh the page you will find this request being made.

So long story short, just head to that link and you will find your desired data loaded there as JSON data.

Please don't forget to mark this solution as an answer if it resolves your problem.

huangapple
  • 本文由 发表于 2023年5月26日 14:58:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76338329.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定