通过其 API 使用请求来爬取网站

huangapple go评论61阅读模式
英文:

scrape website through it's api using request

问题

我想从这个网站上抓取实时比赛数据:https://egamersworld.com/matches

我尝试使用这个API:https://api.egamersworld.com/matches?lang=en
但没有返回任何数据:

import requests
session = requests.Session()
url = "https://api.egamersworld.com/matches?lang=en"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0', "referer": "https://egamersworld.com/"}

r = session.get(url, timeout=30, headers=headers)     
print(r.status_code) #200
r.json() #{"list":[]}

r.status_code 返回 200,但 r.json() 什么都没有返回。
我该如何使用这个API获取数据?

英文:

I want to scrape live matches data from this website: https://egamersworld.com/matches

I tried using the api: https://api.egamersworld.com/matches?lang=en
but ruturn nothing:

import requests
session = requests.Session()
url = "https://api.egamersworld.com/matches?lang=en"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0', "referer": "https://egamersworld.com/"}

r = session.get(url, timeout=30, headers=headers)     
print(r.status_code) #200
r.json() #{"list":[]}

r.status_code return 200, but the r.json() return nothing.
How I can get the data using this api ?

答案1

得分: 2

这是您提供的代码和结果:

我无法看到从该页面访问API端点但是有几个不同的端点这是从其中一个获取数据的一种方法

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

headers = {
    'x-customheader': '435b8c763ab097da9947a385cfe127b0',
    'origin': 'https://egamersworld.com',
    'referer': 'https://egamersworld.com/',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

url = 'https://api.egamersworld.com/live_events?lang=en'

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json())
print(df)

终端中的结果

     	_id 	tags 	title 	game 	slug 	prize 	start_date 	finish_date 	prize_num 	image 	is_live
    0 	Nkqg6l-SR 	[{'title': 'LPL', 'slug': 'lpl'}] 	LPL Summer 2023 	lol 	lpl-2023-summer-Nkqg6l-SR 	$597,366 	2023-05-28T23:00:00.000Z 	2023-08-07T23:00:00.000Z 	597366 	/uploads/tournaments/lpl-summer-20231685005813... 	True
    1 	4kxt3zcXVa 	[{'title': 'ESL', 'slug': 'esl'}, {'title': 'I... 	IEM Dallas 2023 	counterstrike 	iem-dallas-2023-4kxt3zcXVa 	$250,000 	2023-05-29T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	250000 	/uploads/tournaments/intel-extreme-masters-dal... 	True
    2 	NyVSQFNp3 	[{'title': 'DPC', 'slug': 'dpc'}] 	DPC 2023 Tour 3: CN Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-china-division-i... 	$205,000 	2023-05-16T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    3 	V1u4WtNT2 	[{'title': 'DPC', 'slug': 'dpc'}] 	DPC 2023 Tour 3: NA Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-na-division-i-up... 	$205,000 	2023-05-15T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    4 	VJvpbYN6h 	[{'title': 'DPC', 'slug': 'dpc'}] 	DPC 2023 Tour 3: SA Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-sa-division-i-up... 	$205,000 	2023-05-15T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    5 	4ymHMF4Tn 	[{'title': 'DPC', 'slug': 'dpc'}] 	DPC 2023 Tour 3: WEU Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-weu-division-i-u... 	$205,000 	2023-05-15T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    6 	N1hQNtNp2 	[{'title': 'DPC', 'slug': 'dpc'}] 	DPC 2023 Tour 3: SEA Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-sea-division-i-u... 	$205,000 	2023-05-15T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    7 	4JtpGFET2 	[{'title': 'DPC', 'slug': 'dpc'}] 	DPC 2023 Tour 3: EEU Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-eeu-division-i-u... 	$205,000 	2023-05-14T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    8 	41twsuwVC 	[{'title': 'Champion of Champions Tour', 'slug... 	CCT 2023 Online Finals 1 	counterstrike 	cct-2023-online-finals-1-41twsuwVC 	$200,000 	2023-05-23T00:00:00.000Z 	2023-05-30T00:00:00.000Z 	200000 	/uploads/tournaments/cct-2023-online-finals-11... 	True
    9 	N174ae0_0 	[] 	Europe League 2023 - Stage 2 	rainbowsix 	europe-league-2023-stage-2-N174ae0_0 	$191,049 	2023-05-01T23:00:00.000Z 	2023-05-30T23:00:00.000Z 	191049 	/uploads/tournaments

<details>
<summary>英文:</summary>

I could not see that API endpoint being accessed from that page, however, there are a couple different ones, and here is one way of getting the data from one of them:

    import requests
    from bs4 import BeautifulSoup as bs
    import pandas as pd
    
    headers = {
        &#39;x-customheader&#39;: &#39;435b8c763ab097da9947a385cfe127b0&#39;,
        &#39;origin&#39;: &#39;https://egamersworld.com&#39;,
        &#39;referer&#39;: &#39;https://egamersworld.com/&#39;,
    &#39;User-Agent&#39;: &#39;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36&#39;
    }
    
    url = &#39;https://api.egamersworld.com/live_events?lang=en&#39;
    
    r = requests.get(url, headers=headers)
    df = pd.json_normalize(r.json())
    print(df)

Result in terminal:

     	_id 	tags 	title 	game 	slug 	prize 	start_date 	finish_date 	prize_num 	image 	is_live
    0 	Nkqg6l-SR 	[{&#39;title&#39;: &#39;LPL&#39;, &#39;slug&#39;: &#39;lpl&#39;}] 	LPL Summer 2023 	lol 	lpl-2023-summer-Nkqg6l-SR 	$597,366 	2023-05-28T23:00:00.000Z 	2023-08-07T23:00:00.000Z 	597366 	/uploads/tournaments/lpl-summer-20231685005813... 	True
    1 	4kxt3zcXVa 	[{&#39;title&#39;: &#39;ESL&#39;, &#39;slug&#39;: &#39;esl&#39;}, {&#39;title&#39;: &#39;I... 	IEM Dallas 2023 	counterstrike 	iem-dallas-2023-4kxt3zcXVa 	$250,000 	2023-05-29T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	250000 	/uploads/tournaments/intel-extreme-masters-dal... 	True
    2 	NyVSQFNp3 	[{&#39;title&#39;: &#39;DPC&#39;, &#39;slug&#39;: &#39;dpc&#39;}] 	DPC 2023 Tour 3: CN Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-china-division-i... 	$205,000 	2023-05-16T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    3 	V1u4WtNT2 	[{&#39;title&#39;: &#39;DPC&#39;, &#39;slug&#39;: &#39;dpc&#39;}] 	DPC 2023 Tour 3: NA Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-na-division-i-up... 	$205,000 	2023-05-15T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    4 	VJvpbYN6h 	[{&#39;title&#39;: &#39;DPC&#39;, &#39;slug&#39;: &#39;dpc&#39;}] 	DPC 2023 Tour 3: SA Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-sa-division-i-up... 	$205,000 	2023-05-15T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    5 	4ymHMF4Tn 	[{&#39;title&#39;: &#39;DPC&#39;, &#39;slug&#39;: &#39;dpc&#39;}] 	DPC 2023 Tour 3: WEU Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-weu-division-i-u... 	$205,000 	2023-05-15T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    6 	N1hQNtNp2 	[{&#39;title&#39;: &#39;DPC&#39;, &#39;slug&#39;: &#39;dpc&#39;}] 	DPC 2023 Tour 3: SEA Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-sea-division-i-u... 	$205,000 	2023-05-15T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    7 	4JtpGFET2 	[{&#39;title&#39;: &#39;DPC&#39;, &#39;slug&#39;: &#39;dpc&#39;}] 	DPC 2023 Tour 3: EEU Division I (Upper) 	dota2 	dpc-2022-2023-tour-3-season-3-eeu-division-i-u... 	$205,000 	2023-05-14T00:00:00.000Z 	2023-06-04T00:00:00.000Z 	205000 	/uploads/tournaments/dpc-2022-2023-summer-tour... 	True
    8 	41twsuwVC 	[{&#39;title&#39;: &#39;Champion of Champions Tour&#39;, &#39;slug... 	CCT 2023 Online Finals 1 	counterstrike 	cct-2023-online-finals-1-41twsuwVC 	$200,000 	2023-05-23T00:00:00.000Z 	2023-05-30T00:00:00.000Z 	200000 	/uploads/tournaments/cct-2023-online-finals-11... 	True
    9 	N174ae0_0 	[] 	Europe League 2023 - Stage 2 	rainbowsix 	europe-league-2023-stage-2-N174ae0_0 	$191,049 	2023-05-01T23:00:00.000Z 	2023-05-30T23:00:00.000Z 	191049 	/uploads/tournaments/europe-league-2023-stage-... 	True

Page is also being hydrated with info from a couple of websockets connections. Here is one way of getting that streaming information:

    import websocket
    
    def on_message(ws, message):
        print(message)
    
    def on_close(ws, close_status_code,close_msg):
        print(&quot;closed connection&quot;)
    
    socket = &#39;wss://ws.egamersworld.com/socket.io/?EIO=3&amp;transport=websocket&#39;
    ws = websocket.WebSocketApp(socket,on_message= on_message, on_close=on_close)
    
    ws.run_forever()

Response in terminal:

    0{&quot;sid&quot;:&quot;fVEBxmYMAFO-RAPoCQtq&quot;,&quot;upgrades&quot;:[],&quot;pingInterval&quot;:25000,&quot;pingTimeout&quot;:20000}
    40
    42[&quot;cs&quot;,{&quot;NkKs8f2rA&quot;:{&quot;mapStat&quot;:{&quot;homeTeam&quot;:[{&quot;steamId&quot;:&quot;1:1:73849600&quot;,&quot;dbId&quot;:19996,&quot;name&quot;:&quot;Jelo&quot;,&quot;score&quot;:17,&quot;deaths&quot;:9,&quot;assists&quot;:0,&quot;alive&quot;:false,&quot;money&quot;:56500,&quot;damagePrRound&quot;:802,&quot;hp&quot;:0,&quot;kevlar&quot;:false,&quot;helmet&quot;:false,&quot;nick&quot;:&quot;Jelo&quot;,&quot;hasDefusekit&quot;:false,&quot;advancedStats&quot;:{&quot;kast&quot;:1,&quot;entryKills&quot;:0,&quot;entryDeaths&quot;:0,&quot;multiKillRounds&quot;:1,&quot;oneOnXWins&quot;:0,&quot;flashAssists&quot;:0}},{&quot;steamId&quot;:&quot;1:1:571082935&quot;,&quot;dbId&quot;:22703,&quot;name&quot;:&quot;moonwalk&quot;,&quot;score&quot;:8,&quot;deaths&quot;:10,&quot;assists&quot;:0,&quot;alive&quot;:false,&quot;money&quot;:43200,&quot;damagePrRound&quot;:369,&quot;hp&quot;:0,&quot;kevlar&quot;:false,&quot;helmet&quot;:false,&quot;nick&quot;:&quot;moonwalk&quot;,&quot;hasDefusekit&quot;:false,&quot;advancedStats&quot;:{&quot;kast&quot;:1,&quot;entryKills&quot;:0,&quot;entryDeaths&quot;:0,&quot;multiKillRounds&quot;:1,&quot;oneOnXWins&quot;:0,&quot;flashAssists&quot;:.....

</details>



huangapple
  • 本文由 发表于 2023年5月29日 19:26:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76356924.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定