如何解析网站HTML中的 “span” 元素?

huangapple go评论78阅读模式
英文:

How do I parse site html with "span"?

问题

我是新手网页解析,遇到了问题。
我想从网站上提取观看次数的数字。

我为此编写了以下代码:

url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find("div", class_="doc_sharing__body js-social").find("span", class_="sharing")
print(abcd)

但是这个查询的结果是 "None"。

问题出在哪里?请帮忙!

我尝试了多种代码变体的迭代(例如):

url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find_all("span", attr={'class':'sharing'})
print(abcd)

但我得到了相同的结果!

英文:

I'am new in web-parsing and I ran into a problem.
I wanna extract number of watching from the site:
enter image description here

I wrote this code for that purpose:

url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd = soup.find("div", class_="doc_sharing__body js-social").find("span", class_="sharing")
print(abcd)

But result of this query is "None"

What's the problem? Please, help!

I try many iterations with different variation of my code (for example):

url = 'https://www.kommersant.ru/doc/4638344'
request = requests.get(url)
req = request.text
soup=BeautifulSoup(req, 'html')
abcd =  soup.find_all("span", attr = {'class':'sharing'}
print(abcd)

But I have same result!

答案1

得分: 1

我相信类似这样的脚本可以帮助您检索观看次数(我假设这个数字在标题标签内):

import requests
from bs4 import BeautifulSoup

url = "https://www.kommersant.ru/doc/4638344"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

sharing_div = soup.find("div", class_="doc_sharing__body")

title = sharing_div.find("span", class_="sharing")["title"]

watching_number = title.split(": ")[1]

print(watching_number)
英文:

I believe that something like this script can help you retrieve the number of views (I am assuming that this number is the one inside the title tag):

import requests
from bs4 import BeautifulSoup

url = "https://www.kommersant.ru/doc/4638344"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

sharing_div = soup.find("div", class_="doc_sharing__body")

title = sharing_div.find("span", class_="sharing")["title"]

watching_number = title.split(": ")[1]

print(watching_number)

Note: Maybe you'll have to use .find_all(), instead of .find(). In this case, you have to inspect the retrieved results, to check which of them contains the desired title tag.

答案2

得分: 0

The view count is stored in the data-article-views= parameter:

import requests
from bs4 import BeautifulSoup

url = 'https://www.kommersant.ru/doc/4638344'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for article in soup.select('article[data-article-views]'):
    print(article['data-article-title'], article['data-article-views'])

Prints:

«Газпром» начал поставлять газ в Сербию по «Турецкому потоку» 14541
英文:

The view count is stored in the data-article-views= parameter:

import requests
from bs4 import BeautifulSoup

url = 'https://www.kommersant.ru/doc/4638344'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for article in soup.select('article[data-article-views]'):
    print(article['data-article-title'], article['data-article-views'])

Prints:

«Газпром» начал поставлять газ в Сербию по «Турецкому потоку» 14541

huangapple
  • 本文由 发表于 2023年4月4日 04:37:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923583.html
  • html
  • parsing
  • python

如何在HTML的 :?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定