英文:
How do I parse site html with "span"?
问题
我是新手网页解析,遇到了问题。
我想从网站上提取观看次数的数字。
我为此编写了以下代码:
url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find("div", class_="doc_sharing__body js-social").find("span", class_="sharing")
print(abcd)
但是这个查询的结果是 "None"。
问题出在哪里?请帮忙!
我尝试了多种代码变体的迭代(例如):
url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find_all("span", attr={'class':'sharing'})
print(abcd)
但我得到了相同的结果!
英文:
I'am new in web-parsing and I ran into a problem.
I wanna extract number of watching from the site:
enter image description here
I wrote this code for that purpose:
url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find("div", class_="doc_sharing__body js-social").find("span", class_="sharing")
print(abcd)
But result of this query is "None"
What's the problem? Please, help!
I try many iterations with different variation of my code (for example):
url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find_all("span", attr = {'class':'sharing'}
print(abcd)
But I have same result!
答案1
得分: 1
我相信类似这样的脚本可以帮助您检索观看次数(我假设这个数字在标题标签内):
import requests
from bs4 import BeautifulSoup
url = "https://www.kommersant.ru/doc/4638344"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
sharing_div = soup.find("div", class_="doc_sharing__body")
title = sharing_div.find("span", class_="sharing")["title"]
watching_number = title.split(": ")[1]
print(watching_number)
英文:
I believe that something like this script can help you retrieve the number of views (I am assuming that this number is the one inside the title tag):
import requests
from bs4 import BeautifulSoup
url = "https://www.kommersant.ru/doc/4638344"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
sharing_div = soup.find("div", class_="doc_sharing__body")
title = sharing_div.find("span", class_="sharing")["title"]
watching_number = title.split(": ")[1]
print(watching_number)
Note: Maybe you'll have to use .find_all()
, instead of .find()
. In this case, you have to inspect the retrieved results, to check which of them contains the desired title tag.
答案2
得分: 0
The view count is stored in the data-article-views=
parameter:
import requests
from bs4 import BeautifulSoup
url = 'https://www.kommersant.ru/doc/4638344'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for article in soup.select('article[data-article-views]'):
print(article['data-article-title'], article['data-article-views'])
Prints:
«Газпром» начал поставлять газ в Сербию по «Турецкому потоку» 14541
英文:
The view count is stored in the data-article-views=
parameter:
import requests
from bs4 import BeautifulSoup
url = 'https://www.kommersant.ru/doc/4638344'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for article in soup.select('article[data-article-views]'):
print(article['data-article-title'], article['data-article-views'])
Prints:
«Газпром» начал поставлять газ в Сербию по «Турецкому потоку» 14541
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论