2023年2月13日 23:05:54go评论94阅读模式

英文:

Website not returning data that I want using beautifulsoup, but it shows up fine in my browser

问题

我尝试从这个网站抓取一些数据，但出现了403错误。当我在浏览器中打开它时，没有出现错误。帮助将不胜感激。这是我第一次尝试进行网络抓取。我认为我需要在标头中做一些不同的事情？不太确定。谢谢

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
pp_props_url = 'https://api.prizepicks.com/projections?league_id=7&amp;per_page=250&amp;single_stat=true'
headers = {
'Connection': 'keep-alive',
'Accept': 'application/json; charset=UTF-8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
'Access-Control-Allow-Credentials': 'true',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Referer': 'https://app.prizepicks.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9'
}
url = 'https://api.prizepicks.com/projections'
r = requests.get(url, headers=headers)
print(r)
df = pd.json_normalize(r.json()['data'])
print(df)

我收到403错误，而且没有返回我想要的数据。

英文:

I'm trying to scrape some data from this website but getting a 403 error. When I open it in my browser its not giving me the error. Help would be appreciated. This is my first time trying any web scraping. I think I need something different in my header? not sure. thanks

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
pp_props_url = &#39;https://api.prizepicks.com/projections?league_id=7&amp;per_page=250&amp;single_stat=true&#39;
headers = {
&#39;Connection&#39;: &#39;keep-alive&#39;,
&#39;Accept&#39;: &#39;application/json; charset=UTF-8&#39;,
&#39;User-Agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36&#39;,
&#39;Access-Control-Allow-Credentials&#39;: &#39;true&#39;,
&#39;Sec-Fetch-Site&#39;: &#39;same-origin&#39;,
&#39;Sec-Fetch-Mode&#39;: &#39;cors&#39;,
&#39;Referer&#39;: &#39;https://app.prizepicks.com/&#39;,
&#39;Accept-Encoding&#39;: &#39;gzip, deflate, br&#39;,
&#39;Accept-Language&#39;: &#39;en-US,en;q=0.9&#39;
}
url = &#39;https://api.prizepicks.com/projections&#39;
r = requests.get(url, headers=headers)
print(r)
df = pd.json_normalize(r.json()[&#39;data&#39;])
print(df)

I get a 403 error and its not returning the data I want.

答案1

得分: 0

以下是翻译好的代码部分：

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
pp_props_url = 'https://api.prizepicks.com/projections?league_id=7&per_page=250&single_stat=true'
headers = {
'Connection': 'keep-alive',
'Accept': 'application/json; charset=UTF-8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
'Access-Control-Allow-Credentials': 'true',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Referer': 'https://app.prizepicks.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9'
}
r = requests.get(pp_props_url, headers=headers)
print(r)
df = pd.json_normalize(r.json()['data'])
print(df)

英文:

The following code should work:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
pp_props_url = &#39;https://api.prizepicks.com/projections?league_id=7&amp;per_page=250&amp;single_stat=true&#39;
headers = {
&#39;Connection&#39;: &#39;keep-alive&#39;,
&#39;Accept&#39;: &#39;application/json; charset=UTF-8&#39;,
&#39;User-Agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36&#39;,
&#39;Access-Control-Allow-Credentials&#39;: &#39;true&#39;,
&#39;Sec-Fetch-Site&#39;: &#39;same-origin&#39;,
&#39;Sec-Fetch-Mode&#39;: &#39;cors&#39;,
&#39;Referer&#39;: &#39;https://app.prizepicks.com/&#39;,
&#39;Accept-Encoding&#39;: &#39;gzip, deflate, br&#39;,
&#39;Accept-Language&#39;: &#39;en-US,en;q=0.9&#39;
}
r = requests.get(pp_props_url, headers=headers)
print(r)
df = pd.json_normalize(r.json()[&#39;data&#39;])
print(df)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Website not returning data that I want using beautifulsoup, but it shows up fine in my browser.

问题

答案1

如何安装Detectron2

Python：检查文件夹中是否有超过 x 个文件的最快方式

按照条件排序等级字母

将列表的列表重塑为用于CSV导出的数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。