2023年4月4日 04:37:10go评论82阅读模式

英文:

How do I parse site html with "span"?

问题

我是新手网页解析，遇到了问题。
我想从网站上提取观看次数的数字。

我为此编写了以下代码：

url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find("div", class_="doc_sharing__body js-social").find("span", class_="sharing")
print(abcd)

但是这个查询的结果是 "None"。

问题出在哪里？请帮忙！

我尝试了多种代码变体的迭代（例如）：

url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find_all("span", attr={'class':'sharing'})
print(abcd)

但我得到了相同的结果！

英文:

I'am new in web-parsing and I ran into a problem.
I wanna extract number of watching from the site:
enter image description here

I wrote this code for that purpose:

url = &#39;https://www.kommersant.ru/doc/4638344&#39;
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, &#39;html&#39;)
abcd = soup.find(&quot;div&quot;, class_=&quot;doc_sharing__body js-social&quot;).find(&quot;span&quot;, class_=&quot;sharing&quot;)
print(abcd)

But result of this query is "None"

What's the problem? Please, help!

I try many iterations with different variation of my code (for example):

url = &#39;https://www.kommersant.ru/doc/4638344&#39;
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, &#39;html&#39;)
abcd =  soup.find_all(&quot;span&quot;, attr = {&#39;class&#39;:&#39;sharing&#39;}
print(abcd)

But I have same result!

答案1

得分: 1

我相信类似这样的脚本可以帮助您检索观看次数（我假设这个数字在标题标签内）：

import requests
from bs4 import BeautifulSoup

url = "https://www.kommersant.ru/doc/4638344"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

sharing_div = soup.find("div", class_="doc_sharing__body")

title = sharing_div.find("span", class_="sharing")["title"]

watching_number = title.split(": ")[1]

print(watching_number)

英文:

I believe that something like this script can help you retrieve the number of views (I am assuming that this number is the one inside the title tag):

import requests
from bs4 import BeautifulSoup

url = &quot;https://www.kommersant.ru/doc/4638344&quot;
response = requests.get(url)
soup = BeautifulSoup(response.content, &quot;html.parser&quot;)

sharing_div = soup.find(&quot;div&quot;, class_=&quot;doc_sharing__body&quot;)

title = sharing_div.find(&quot;span&quot;, class_=&quot;sharing&quot;)[&quot;title&quot;]

watching_number = title.split(&quot;: &quot;)[1]

print(watching_number)

Note: Maybe you'll have to use .find_all(), instead of .find(). In this case, you have to inspect the retrieved results, to check which of them contains the desired title tag.

答案2

得分: 0

The view count is stored in the data-article-views= parameter:

import requests
from bs4 import BeautifulSoup

url = 'https://www.kommersant.ru/doc/4638344'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for article in soup.select('article[data-article-views]'):
    print(article['data-article-title'], article['data-article-views'])

Prints:

«Газпром» начал поставлять газ в Сербию по «Турецкому потоку» 14541

英文:

The view count is stored in the data-article-views= parameter:

import requests
from bs4 import BeautifulSoup

url = &#39;https://www.kommersant.ru/doc/4638344&#39;
soup = BeautifulSoup(requests.get(url).content, &#39;html.parser&#39;)

for article in soup.select(&#39;article[data-article-views]&#39;):
    print(article[&#39;data-article-title&#39;], article[&#39;data-article-views&#39;])

Prints:

&#171;Газпром&#187; начал поставлять газ в Сербию по &#171;Турецкому потоку&#187; 14541

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何解析网站HTML中的 “span” 元素？

问题

答案1

答案2

BigQuery Cloud Function 的入口点是什么？

如何使`cv2.HoughLinesP` 仅检测垂直线？

Dataclass 代码，在 Python 版本允许的情况下将 slots 设置为 true。

Alexa Skill需要超过8秒才能完成Lambda。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论