英文:
Is there a way to specifically web scrape and get the data of heights that is not listed in text?
问题
我正在网页抓取一些列出的运动员的身高。我已经编写了代码来获取身高,但在检查元素之后,我注意到在文本下面,身高以英尺表示,但在"data-sort"属性中,身高以英寸表示。这两者都在"class"属性为"height"的"td"标签中。但是,当我使用"get_text()"或".text"来删除HTML元素时,它只打印出英尺中的身高,并删除了隐藏的英寸部分。有没有办法可以获取以英寸表示的身高,因为这将使数学计算更容易。
以下是示例我正在网页抓取的内容,我想删除一切,只获取以英寸表示的身高,即[79, 85, 74...]在这种情况下。
<td class="height" data-sort="79">6-7</td>
<td class="height" data-sort="85">7-1</td>
<td class="height" data-sort="74">6-2</td>
# 这是我的代码
from bs4 import BeautifulSoup
import requests
urls=['https://goduke.com/sports/mens-basketball/roster']
ListData=[]
for x in range(len(urls)):
page=requests.get(urls[x]).text
pagesoup=BeautifulSoup(page,'html.parser')
h=pagesoup.find_all('td', class_="height")
ListData.append(h)
NewList=[]
for b in range(len(ListData)):
new=[]
for x in ListData[b]:
print(x.text)
[注:以上是代码和文本的翻译。]
英文:
I'm web scraping a bunch of heights for listed athletes. I have written the code to get the heights but after inspecting element, I noticed that under text the height is written in feet, but in "data-sort" that height is listed in inches. Both of these are in the td tag in class "heights". However when I use "get_text()" or .text to remove the html elements it only prints out the height in feet and removes the hidden height in inches. Is there a way I can get the height listed in inches because that will make it easier to the do math.
Here is an example of what I'm web scraping, I want remove everything and only get the height in inches which will be [79,85,74... in this case.
<td class="height" data-sort="79">6-7</td>
<td class="height" data-sort="85">7-1</td>
<td class="height" data-sort="74">6-2</td>
#This is my code
from bs4 import BeautifulSoup
import requests
urls=['https://goduke.com/sports/mens-basketball/roster']
ListData=[]
for x in range(len(urls)):
page=requests.get(urls[x]).text
pagesoup=BeautifulSoup(page,'html.parser')
h=pagesoup.find_all('td', class_="height")
ListData.append(h)
NewList=[]
for b in range(len(ListData)):
new=[]
for x in ListData[b]:
print(x.text)
答案1
得分: 0
如果您使用CSS选择器,您可以简单地传递第一个类名。
from scrapy.selector import Selector
英文:
If you use css selector you can simply pass the first class name.
from scrapy.selector import Selector
答案2
得分: 0
from bs4 import BeautifulSoup
import requests
urls=['https://goduke.com/sports/mens-basketball/roster']
ListData=[]
for url in urls:
page=requests.get(url).text
pagesoup=BeautifulSoup(page,'html.parser')
tds = pagesoup.select('td.height[data-sort]')
for td in tds:
ListData.append(td.attrs['data-sort'])
print(ListData)
output
['79', '85', '74', '74', '77', '77', '78', '77', '82', '85', '80', '84', '77', '84', '68']
英文:
from bs4 import BeautifulSoup
import requests
urls=['https://goduke.com/sports/mens-basketball/roster']
ListData=[]
for url in urls:
page=requests.get(url).text
pagesoup=BeautifulSoup(page,'html.parser')
tds = pagesoup.select('td.height[data-sort]')
for td in tds:
ListData.append(td.attrs['data-sort'])
print(ListData)
output
['79', '85', '74', '74', '77', '77', '78', '77', '82', '85', '80', '84', '77', '84', '68']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论