无法在网络抓取中获取链接。

huangapple go评论73阅读模式
英文:

not able to get a link while web scraping

问题

我想使用Python进行网页抓取,在选择“T20I”时进行。为此,我需要在请求和BeautifulSoup中放入一个特定的链接。

每当我打开https://www.espncricinfo.com/cricketers/team/india-6 这个链接时,我会看到一个页面上选择了“INTL”。

选择“INTL”的图片:
无法在网络抓取中获取链接。

但是当我选择“T20I”时,我会看到一个不同的页面,但链接相似 https://www.espncricinfo.com/cricketers/team/india-6

选择“T20I”的图片:
无法在网络抓取中获取链接。

那么在这种情况下,我应该怎么做才能获取数据呢?当选择“T20I”时,我如何获取数据?

英文:

I want to do web scraping using Python, on a page when 'T20I' is selected. For that, I need to put a specific link in requests and beautifulsoup.

whenever I open https://www.espncricinfo.com/cricketers/team/india-6 this link, I get a page with "INTL" selected.

Image with "INTL" selected:
无法在网络抓取中获取链接。

But when I select "T20I", I get a different page but with similar link https://www.espncricinfo.com/cricketers/team/india-6

image with "T20I" selected:
无法在网络抓取中获取链接。

Then what should I do in this situation to retrieve the data? How will I get data when "T20I" is selected?

答案1

得分: 1

我建议使用Selenium。以下是一个可以正常工作的示例:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException

chrome_options = Options()
chrome_options.add_argument("--start-maximized")
browser = webdriver.Chrome(r"YOUR chromedriver.exe的路径", options=chrome_options)

browser.get("https://www.espncricinfo.com/cricketers/team/india-6")
# 通过XPATH查找要点击的元素
button = browser.find_element(By.XPATH, '//*[@id="main-container"]/div[5]/div[1]/div[2]/div[2]/div/div/div[2]/div[2]/div[1]/div/div[5]/a/span/span')
# 点击按钮 "T20I"
button.click()

尽情享用!

英文:

I recommend to use Selenium
Here is the example which can work just fine

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException

chrome_options = Options()
chrome_options.add_argument("--start-maximized");
browser =webdriver.Chrome(r"YOUR chromedriver.exe", options=chrome_options)


browser.get("https://www.espncricinfo.com/cricketers/team/india-6")
#find element you want to click by XPATH
button = browser.find_element(By.XPATH, '//*[@id="main-container"]/div[5]/div[1]/div[2]/div[2]/div/div/div[2]/div[2]/div[1]/div/div[5]/a/span/span')
#click the button "T20I"
button.click()

Enjoy it!

答案2

得分: 0

数据通过JavaScript呈现。有一个API来获取数据。当有API可用时,请不要使用Selenium。

import requests
import pandas as pd

url = 'https://hs-consumer-api.espncricinfo.com/v1/pages/player/search'
payload = {
    'mode': 'BOTH',
    'page': '1',
    'records': '40',
    'filterActive': 'true',
    'filterTeamId': '6',
    'filterClassId': '3',
    'filterFormatLevel': 'ALL',
    'sort': 'ALPHA_ASC'
}

jsonData = requests.get(url, params=payload).json()

df = pd.DataFrame(jsonData['results'])

# 输出:前5行,共31行
print(df.head().to_string())

输出:前5行,共31行

请注意,输出中的一些内容包含HTML编码,因此可能需要进一步处理以进行显示。

英文:

Data is rendered through javascript. There is an api that pulls that. Don't use Selenium when there is an api.

import requests
import pandas as pd

url = 'https://hs-consumer-api.espncricinfo.com/v1/pages/player/search'
payload = {
    'mode': 'BOTH',
    'page': '1',
    'records': '40',
    'filterActive': 'true',
    'filterTeamId': '6',
    'filterClassId': '3',
    'filterFormatLevel': 'ALL',
    'sort': 'ALPHA_ASC'}

jsonData = requests.get(url, params=payload).json()

df = pd.DataFrame(jsonData['results'])

Output: 1st 5 rows of 31 rows

print(df.head().to_string())
id  objectId            name             longName  mobileName       indexName     battingName    fieldingName                 slug                              imageUrl                              dateOfBirth dateOfDeath gender battingStyles bowlingStyles longBattingStyles        longBowlingStyles                                                                                                                                                                                                                                                                                                                      image  countryTeamId playerRoleTypeIds          playingRoles                                                                                                                                                                                                                                                                                                                                                                                                                                            headshotImage
0  101430   1125976  Arshdeep Singh       Arshdeep Singh              Arshdeep Singh  Arshdeep Singh  Arshdeep Singh       arshdeep-singh  /db/PICTURES/CMS/356700/356795.1.png    {'year': 1999, 'month': 2, 'date': 5}        None      M         [lhb]         [lmf]   [left-hand bat]   [left-arm medium-fast]  {'id': 356795, 'objectId': 1365005, 'slug': 'arshdeep-singh-player-portrait', 'url': '/db/PICTURES/CMS/356700/356795.1.png', 'width': 160, 'height': 213, 'caption': 'Arshdeep Singh player portrait', 'longCaption': 'Arshdeep Singh player portrait', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None}              6               [4]              [bowler]        {'id': 322178, 'objectId': 1264653, 'slug': 'arshdeep-singh-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/322100/322178.png', 'width': 600, 'height': 436, 'caption': 'Arshdeep Singh player page headshot cutout, 2021', 'longCaption': 'Arshdeep Singh player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/322100/322178.square.png'}}
1   12894     26421        R Ashwin  Ravichandran Ashwin      Ashwin       Ashwin, R        R Ashwin          Ashwin  ravichandran-ashwin    /db/PICTURES/CMS/302300/302395.jpg   {'year': 1986, 'month': 9, 'date': 17}        None      M         [rhb]          [ob]  [right-hand bat]     [right-arm offbreak]                               {'id': 302395, 'objectId': 1220592, 'slug': 'r-ashwin-portrait', 'url': '/db/PICTURES/CMS/302300/302395.jpg', 'width': 160, 'height': 200, 'caption': 'R Ashwin portrait', 'longCaption': 'R Ashwin portrait, April 2020', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None}              6              [11]  [bowling allrounder]                                                                                                                                           {'id': 316521, 'objectId': 1251150, 'slug': 'r-ashwin-headshot', 'url': '/db/PICTURES/CMS/316500/316521.png', 'width': 600, 'height': 436, 'caption': 'R Ashwin headshot', 'longCaption': 'R Ashwin headshot', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': None}}
2   73507    694211      Avesh Khan           Avesh Khan  Avesh Khan      Avesh Khan      Avesh Khan      Avesh Khan           avesh-khan  /db/PICTURES/CMS/200000/200065.1.jpg  {'year': 1996, 'month': 12, 'date': 13}        None      M         [rhb]         [rfm]  [right-hand bat]  [right-arm fast-medium]                             {'id': 200065, 'objectId': 807641, 'slug': 'avesh-khan-portrait', 'url': '/db/PICTURES/CMS/200000/200065.1.jpg', 'width': 160, 'height': 200, 'caption': 'Avesh Khan portrait', 'longCaption': 'Avesh Khan portrait, November 2014', 'credit': 'MPCA', 'photographer': None, 'peerUrls': None}              6               [4]              [bowler]                    {'id': 322244, 'objectId': 1264747, 'slug': 'avesh-khan-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/322200/322244.png', 'width': 600, 'height': 436, 'caption': 'Avesh Khan player page headshot cutout, 2021', 'longCaption': 'Avesh Khan player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/322200/322244.square.png'}}
3   70640    625383       JJ Bumrah       Jasprit Bumrah      Bumrah      Bumrah, JJ       JJ Bumrah          Bumrah       jasprit-bumrah  /db/PICTURES/CMS/356800/356849.1.png   {'year': 1993, 'month': 12, 'date': 6}        None      M         [rhb]          [rf]  [right-hand bat]         [right-arm fast]                          {'id': 356849, 'objectId': 1365132, 'slug': 'bumrah-player-portrait', 'url': '/db/PICTURES/CMS/356800/356849.1.png', 'width': 160, 'height': 206, 'caption': 'Bumrah player portrait', 'longCaption': 'Bumrah player portrait', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None}              6               [4]              [bowler]        {'id': 319940, 'objectId': 1260219, 'slug': 'jasprit-bumrah-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/319900/319940.png', 'width': 600, 'height': 436, 'caption': 'Jasprit Bumrah player page headshot cutout, 2021', 'longCaption': 'Jasprit Bumrah player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/319900/319940.square.png'}}
4   61325    430246       YS Chahal     Yuzvendra Chahal      Chahal      Chahal, YS       YS Chahal          Chahal     yuzvendra-chahal    /db/PICTURES/CMS/312100/312155.png   {'year': 1990, 'month': 7, 'date': 23}        None      M         [rhb]         [lbg]  [right-hand bat]        [legbreak googly]    {'id': 312155, 'objectId': 1239214, 'slug': 'yuzvendra-chahal-portrait', 'url': '/db/PICTURES/CMS/312100/312155.png', 'width': 160, 'height': 200, 'caption': 'Yuzvendra Chahal portrait', 'longCaption': 'Yuzvendra Chahal portrait, November 2020', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None}              6               [4]              [bowler]  {'id': 319955, 'objectId': 1260243, 'slug': 'yuzvendra-chahal-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/319900/319955.png', 'width': 600, 'height': 436, 'caption': 'Yuzvendra Chahal player page headshot cutout, 2021', 'longCaption': 'Yuzvendra Chahal player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/319900/319955.square.png'}}

huangapple
  • 本文由 发表于 2023年7月6日 13:56:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76625891.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定