英文:
not able to get a link while web scraping
问题
我想使用Python进行网页抓取,在选择“T20I”时进行。为此,我需要在请求和BeautifulSoup中放入一个特定的链接。
每当我打开https://www.espncricinfo.com/cricketers/team/india-6 这个链接时,我会看到一个页面上选择了“INTL”。
选择“INTL”的图片:
但是当我选择“T20I”时,我会看到一个不同的页面,但链接相似 https://www.espncricinfo.com/cricketers/team/india-6
选择“T20I”的图片:
那么在这种情况下,我应该怎么做才能获取数据呢?当选择“T20I”时,我如何获取数据?
英文:
I want to do web scraping using Python, on a page when 'T20I' is selected. For that, I need to put a specific link in requests and beautifulsoup.
whenever I open https://www.espncricinfo.com/cricketers/team/india-6 this link, I get a page with "INTL" selected.
Image with "INTL" selected:
But when I select "T20I", I get a different page but with similar link https://www.espncricinfo.com/cricketers/team/india-6
image with "T20I" selected:
Then what should I do in this situation to retrieve the data? How will I get data when "T20I" is selected?
答案1
得分: 1
我建议使用Selenium。以下是一个可以正常工作的示例:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
browser = webdriver.Chrome(r"YOUR chromedriver.exe的路径", options=chrome_options)
browser.get("https://www.espncricinfo.com/cricketers/team/india-6")
# 通过XPATH查找要点击的元素
button = browser.find_element(By.XPATH, '//*[@id="main-container"]/div[5]/div[1]/div[2]/div[2]/div/div/div[2]/div[2]/div[1]/div/div[5]/a/span/span')
# 点击按钮 "T20I"
button.click()
尽情享用!
英文:
I recommend to use Selenium
Here is the example which can work just fine
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
chrome_options = Options()
chrome_options.add_argument("--start-maximized");
browser =webdriver.Chrome(r"YOUR chromedriver.exe", options=chrome_options)
browser.get("https://www.espncricinfo.com/cricketers/team/india-6")
#find element you want to click by XPATH
button = browser.find_element(By.XPATH, '//*[@id="main-container"]/div[5]/div[1]/div[2]/div[2]/div/div/div[2]/div[2]/div[1]/div/div[5]/a/span/span')
#click the button "T20I"
button.click()
Enjoy it!
答案2
得分: 0
数据通过JavaScript呈现。有一个API来获取数据。当有API可用时,请不要使用Selenium。
import requests
import pandas as pd
url = 'https://hs-consumer-api.espncricinfo.com/v1/pages/player/search'
payload = {
'mode': 'BOTH',
'page': '1',
'records': '40',
'filterActive': 'true',
'filterTeamId': '6',
'filterClassId': '3',
'filterFormatLevel': 'ALL',
'sort': 'ALPHA_ASC'
}
jsonData = requests.get(url, params=payload).json()
df = pd.DataFrame(jsonData['results'])
# 输出:前5行,共31行
print(df.head().to_string())
输出:前5行,共31行
请注意,输出中的一些内容包含HTML编码,因此可能需要进一步处理以进行显示。
英文:
Data is rendered through javascript. There is an api that pulls that. Don't use Selenium when there is an api.
import requests
import pandas as pd
url = 'https://hs-consumer-api.espncricinfo.com/v1/pages/player/search'
payload = {
'mode': 'BOTH',
'page': '1',
'records': '40',
'filterActive': 'true',
'filterTeamId': '6',
'filterClassId': '3',
'filterFormatLevel': 'ALL',
'sort': 'ALPHA_ASC'}
jsonData = requests.get(url, params=payload).json()
df = pd.DataFrame(jsonData['results'])
Output: 1st 5 rows of 31 rows
print(df.head().to_string())
id objectId name longName mobileName indexName battingName fieldingName slug imageUrl dateOfBirth dateOfDeath gender battingStyles bowlingStyles longBattingStyles longBowlingStyles image countryTeamId playerRoleTypeIds playingRoles headshotImage
0 101430 1125976 Arshdeep Singh Arshdeep Singh Arshdeep Singh Arshdeep Singh Arshdeep Singh arshdeep-singh /db/PICTURES/CMS/356700/356795.1.png {'year': 1999, 'month': 2, 'date': 5} None M [lhb] [lmf] [left-hand bat] [left-arm medium-fast] {'id': 356795, 'objectId': 1365005, 'slug': 'arshdeep-singh-player-portrait', 'url': '/db/PICTURES/CMS/356700/356795.1.png', 'width': 160, 'height': 213, 'caption': 'Arshdeep Singh player portrait', 'longCaption': 'Arshdeep Singh player portrait', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None} 6 [4] [bowler] {'id': 322178, 'objectId': 1264653, 'slug': 'arshdeep-singh-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/322100/322178.png', 'width': 600, 'height': 436, 'caption': 'Arshdeep Singh player page headshot cutout, 2021', 'longCaption': 'Arshdeep Singh player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/322100/322178.square.png'}}
1 12894 26421 R Ashwin Ravichandran Ashwin Ashwin Ashwin, R R Ashwin Ashwin ravichandran-ashwin /db/PICTURES/CMS/302300/302395.jpg {'year': 1986, 'month': 9, 'date': 17} None M [rhb] [ob] [right-hand bat] [right-arm offbreak] {'id': 302395, 'objectId': 1220592, 'slug': 'r-ashwin-portrait', 'url': '/db/PICTURES/CMS/302300/302395.jpg', 'width': 160, 'height': 200, 'caption': 'R Ashwin portrait', 'longCaption': 'R Ashwin portrait, April 2020', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None} 6 [11] [bowling allrounder] {'id': 316521, 'objectId': 1251150, 'slug': 'r-ashwin-headshot', 'url': '/db/PICTURES/CMS/316500/316521.png', 'width': 600, 'height': 436, 'caption': 'R Ashwin headshot', 'longCaption': 'R Ashwin headshot', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': None}}
2 73507 694211 Avesh Khan Avesh Khan Avesh Khan Avesh Khan Avesh Khan Avesh Khan avesh-khan /db/PICTURES/CMS/200000/200065.1.jpg {'year': 1996, 'month': 12, 'date': 13} None M [rhb] [rfm] [right-hand bat] [right-arm fast-medium] {'id': 200065, 'objectId': 807641, 'slug': 'avesh-khan-portrait', 'url': '/db/PICTURES/CMS/200000/200065.1.jpg', 'width': 160, 'height': 200, 'caption': 'Avesh Khan portrait', 'longCaption': 'Avesh Khan portrait, November 2014', 'credit': 'MPCA', 'photographer': None, 'peerUrls': None} 6 [4] [bowler] {'id': 322244, 'objectId': 1264747, 'slug': 'avesh-khan-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/322200/322244.png', 'width': 600, 'height': 436, 'caption': 'Avesh Khan player page headshot cutout, 2021', 'longCaption': 'Avesh Khan player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/322200/322244.square.png'}}
3 70640 625383 JJ Bumrah Jasprit Bumrah Bumrah Bumrah, JJ JJ Bumrah Bumrah jasprit-bumrah /db/PICTURES/CMS/356800/356849.1.png {'year': 1993, 'month': 12, 'date': 6} None M [rhb] [rf] [right-hand bat] [right-arm fast] {'id': 356849, 'objectId': 1365132, 'slug': 'bumrah-player-portrait', 'url': '/db/PICTURES/CMS/356800/356849.1.png', 'width': 160, 'height': 206, 'caption': 'Bumrah player portrait', 'longCaption': 'Bumrah player portrait', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None} 6 [4] [bowler] {'id': 319940, 'objectId': 1260219, 'slug': 'jasprit-bumrah-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/319900/319940.png', 'width': 600, 'height': 436, 'caption': 'Jasprit Bumrah player page headshot cutout, 2021', 'longCaption': 'Jasprit Bumrah player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/319900/319940.square.png'}}
4 61325 430246 YS Chahal Yuzvendra Chahal Chahal Chahal, YS YS Chahal Chahal yuzvendra-chahal /db/PICTURES/CMS/312100/312155.png {'year': 1990, 'month': 7, 'date': 23} None M [rhb] [lbg] [right-hand bat] [legbreak googly] {'id': 312155, 'objectId': 1239214, 'slug': 'yuzvendra-chahal-portrait', 'url': '/db/PICTURES/CMS/312100/312155.png', 'width': 160, 'height': 200, 'caption': 'Yuzvendra Chahal portrait', 'longCaption': 'Yuzvendra Chahal portrait, November 2020', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None} 6 [4] [bowler] {'id': 319955, 'objectId': 1260243, 'slug': 'yuzvendra-chahal-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/319900/319955.png', 'width': 600, 'height': 436, 'caption': 'Yuzvendra Chahal player page headshot cutout, 2021', 'longCaption': 'Yuzvendra Chahal player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/319900/319955.square.png'}}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论