英文:
How to properly replicate a web site's GET request to an API?
问题
抱歉,由于您要求只翻译代码部分,因此我将跳过前面的说明和问题,只翻译代码部分,如下所示:
import requests
import json
# 查询URL
def request_from_api(url, url_params, req_headers, cookies):
response = requests.get(url, params=url_params, headers=req_headers, cookies=cookies)
return response
def format_cookies(cookie_pairs):
pairs = [pair.split('=') for pair in cookie_pairs.split('; ')]
formatted_pairs = {cookie_val[0]: cookie_val[1] for cookie_val in pairs}
return formatted_pairs
if __name__ == '__main__':
api_url = "https://www.jewelosco.com/abs/pub/xapi/pgmsearch/v1/search/products?"
url_params = {
"request-id": "1771677643767994529",
"url": "https://www.jewelosco.com",
"pageurl": "https://www.jewelosco.com",
"pagename": "search",
"rows": "30",
"start": "0",
"search-type": "keyword",
"storeid": "1118",
"featured": "true",
"search-uid": "",
"q": "rice",
"sort": "",
"featuredsessionid": "",
"screenwidth": "1533",
"dvid": "web-4.1search",
"channel": "instore",
"banner": "jewelosco"
}
headers = {
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"DNT": "1",
"Host": "www.jewelosco.com",
"Ocp-Apim-Subscription-Key": "5e790236c84e46338f4290aa1050cdd4",
"Referer": "https://www.jewelosco.com/shop/search-results.html?q=rice",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"TE": "trailers",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0"
}
raw_form_cookies = "visid_incap_1990338=s+w9h0GrTSqb/iWdgj5yGT7p/2MAAAAAQUIPAAAAAAD+pkwygiCfx/ikABjRUg/L; nlbi_1990338=mHC1ApVnlTLFJURPzoaznQAAAACG3swCSzQedoLPtuqqPhlT; incap_ses_8080_1990338=FeLoM/tDE2aUu2sos+0hcD7p/2MAAAAAyClJy0AvAh6rRWqmCVVCcw==; ECommBanner=jewelosco; abs_gsession=%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22Selection%22%3A%22user%22%2C%22preference%22%3A%22J4U%22%2C%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%7D%2C%22J4U%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%2C%22SHOP%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%7D%7D; SWY_SHARED_SESSION_INFO=%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%2C%22preference%22%3A%22J4U%22%2C%22Selection%22%3A%22user%22%2C%22userData%22%3A%7B%7D%7D%2C%22J4U%22%3A%7B%22storeId%22%3A%221118%22%2C%22zipcode%22%3A%2252732%22%2C%22userData%22%3A%7B%7D%7D%2C%22SHOP%22%3A%7B%22storeId%22%3A%221118%22%2C%22zipcode%22%3A%2252732%22%2C%22userData%22%3A%7B%7D%7D%7D%7D; abs_previouslogin=%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22Selection%22%3A%22user%22%2C%22preference%22%3A%22J4U%22%2C%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%7D%2C%22J4U%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%2C%22SHOP%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%7D%7D; SWY_SYND_USER_INFO=%7B%22storeAddress%22%3A%22%22%2C%22storeZip%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%2C%22preference%22%3A%22J4U%22%7D; ECommSignInCount=0; SAFEWAY_MODAL_LINK=; OptanonConsent=isGpcEnabled=0&datestamp=Wed+Mar+01
<details>
<summary>英文:</summary>
I am trying to scrape data from this [page](https://www.jewelosco.com/shop/search-results.html?q=rice), specifically all the information about the products.
Using my browser's Inspect tool, I found that all of the products' data come from a JSON file; it is a response to a GET request sent to an API at this [URL](https://www.jewelosco.com/abs/pub/xapi/pgmsearch/v1/search/products?request-id=1771677643767994529&url=https%3A%2F%2Fwww.jewelosco.com&pageurl=https%3A%2F%2Fwww.jewelosco.com&pagename=search&rows=30&start=0&search-type=keyword&storeid=1118&featured=true&search-uid=&q=rice&sort=&featuredsessionid=&screenwidth=1533&dvid=web-4.1search&channel=instore&banner=jewelosco). Looking at the request headers I found the Subscription Key (i.e. `Ocp-Apim-Subscription-Key`) and its value (i.e. `5e790236c84e46338f4290aa1050cdd4`).
I tried to get this JSON file by sending the GET request myself using the python request module, but it responded with a JSON file that contained an error message--`"appMsg": "Search encountered a problem. Please try again OSSR0033-R"`.
So it seems like I am able to connect to the API but the program on the other side is failing to find the product-data JSON file. I'm assuming the failure is due to a mistake in my GET request. **If this assumption is even valid, how can I properly replicate the request so that I can receive the expected output?**
#### Here is My Code:
import requests
import json
query url
def request_from_api(url, url_params, req_headers):
response = requests.get(url, params=url_params, headers=req_headers)
return response
def format_cookies(cookie_pairs):
'''
Takes a "list" of name-value pairs e.g. "cook1=value1; cook2=val2"
'''
pairs = [pair.split('=') for pair in cookie_pairs.split('; ')]
formatted_pairs = {cookie_val[0]:cookie_val[1] for cookie_val in pairs}
return formatted_pairs
if name == 'main':
# url that API is located at
api_url = "https://www.jewelosco.com/abs/pub/xapi/pgmsearch/v1/search/products?"
# url parameters for api_url
url_params = {
"request-id": "1771677643767994529",
"url": "https://www.jewelosco.com",
"pageurl": "https://www.jewelosco.com",
"pagename": "search",
"rows": "30",
"start": "0",
"search-type": "keyword",
"storeid": "1118",
"featured": "true",
"search-uid": "",
"q": "rice",
"sort": "",
"featuredsessionid": "",
"screenwidth": "1533",
"dvid": "web-4.1search",
"channel": "instore",
"banner": "jewelosco"
}
# API sub key
headers = {
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"DNT": "1",
"Host": "www.jewelosco.com",
"Ocp-Apim-Subscription-Key": "5e790236c84e46338f4290aa1050cdd4",
"Referer": "https://www.jewelosco.com/shop/search-results.html?q=rice",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"TE": "trailers",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0"
}
# List of cookie-value pairs copied from my browser's Inspect tool
raw_form_cookies = "visid_incap_1990338=s+w9h0GrTSqb/iWdgj5yGT7p/2MAAAAAQUIPAAAAAAD+pkwygiCfx/ikABjRUg/L; nlbi_1990338=mHC1ApVnlTLFJURPzoaznQAAAACG3swCSzQedoLPtuqqPhlT; incap_ses_8080_1990338=FeLoM/tDE2aUu2sos+0hcD7p/2MAAAAAyClJy0AvAh6rRWqmCVVCcw==; ECommBanner=jewelosco; abs_gsession=%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22Selection%22%3A%22user%22%2C%22preference%22%3A%22J4U%22%2C%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%7D%2C%22J4U%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%2C%22SHOP%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%7D%7D; SWY_SHARED_SESSION_INFO=%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%2C%22preference%22%3A%22J4U%22%2C%22Selection%22%3A%22user%22%2C%22userData%22%3A%7B%7D%7D%2C%22J4U%22%3A%7B%22storeId%22%3A%221118%22%2C%22zipcode%22%3A%2252732%22%2C%22userData%22%3A%7B%7D%7D%2C%22SHOP%22%3A%7B%22storeId%22%3A%221118%22%2C%22zipcode%22%3A%2252732%22%2C%22userData%22%3A%7B%7D%7D%7D%7D; abs_previouslogin=%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22Selection%22%3A%22user%22%2C%22preference%22%3A%22J4U%22%2C%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%7D%2C%22J4U%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%2C%22SHOP%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%7D%7D; SWY_SYND_USER_INFO=%7B%22storeAddress%22%3A%22%22%2C%22storeZip%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%2C%22preference%22%3A%22J4U%22%7D; ECommSignInCount=0; SAFEWAY_MODAL_LINK=; OptanonConsent=isGpcEnabled=0&datestamp=Wed+Mar+01+2023+18%3A10%3A25+GMT-0600+(Central+Standard+Time)&version=202212.1.0&isIABGlobal=false&hosts=&consentId=2481ceef-8878-4f3b-924b-3b28079d9b13&interactionCount=1&landingPath=NotLandingPage&groups=C0001%3A1%2CC0002%3A0%2CC0004%3A0%2CC0003%3A1&AwaitingReconsent=false; nlbi_1990338_2147483392=jYckLK1heGBAHrRyzoaznQAAAACZvsW6rrz3C1oXWBs6UFc8; reese84=3:Gl8qjGMtFKfV15EgMleAnA==:OIn+iQ/52nnNf5lyREodaDDwUAjg8dDGS98wIlrt5otpbU+Cf8LVvyWEszAKcXR472IFIvx0GqApqQXL+AwRenGrptfNzKJtsu+zlyayIVp5q9BJEyz9T9tIFT2YmnQ+D1rZkBlw2lcnRZqxvVX5dSG6pFJH9nebThXLpHGzKF+j2O1jRKRTanLc72sHU5aqkDgp6aKgzvMI3IQTg9JPnSYW1I0779+gNrb/WfVOID4YT3FLG3OBiMxXsnGGrGQD+3QUsGWzJGXqKkLgErxusDcDI+J82YxLg8Lg7u+qbLFLdUPB4dUsPJJLlHJx8kMBuoRh/47QtMYdykoXYmcZ4PYYLnop7lpDFahVOwcqGmwGCCBjkAnxGuVejNESYc4Yiu5iHFluuEHSDyLxXUmlQWRfDl6axKS+0m6Zm7IqPmvetfC4BsZKbDRk5p/jbFDCIYD/iHbRi8OE/mkzTD03r+un1iC5GFK4BhIQrtBDybXmZYJU1VBwXl+raL8wR0Db3d3I/Mbh4/CK1uT/7CJDRIDznlCZC0/C3gFwXQpfLiA=:XtGGSfw6IB+W6dYIh0iO+xPVdddBfiRA1zwKMhu0OmE=; mbox=session#2686aefa9dea422db9f92c9b39a01830#1677717696; at_check=true; ADRUM_BT=R:57|i:5124367|g:a106a4d3-bbb8-4619-8262-9d3f98852991652436|e:104|n:safeway-loyalty_d99a98d0-07cc-4871-98b7-0beac77d0580"
formatted_cookies = format_cookies(raw_form_cookies)
# combine api_url and url_params and make GET request with headers
product_data = request_from_api(api_url, url_params, headers, formatted_cookies).json()
# pretty print json file
print(json.dumps(product_data, indent=3))
#### Actual Output:
{
"appMsg": "[PS: Success.]",
"primaryProducts": {
"appCode": "400",
"appMsg": "Search encountered a problem. Please try again OSSR0033-R",
"pgmName": "search-products",
"order": "1"
},
"appCode": "[PS: 200]"
}
#### Expected Output:
Its a large json file that contains all of the products information (e.g. name, price, quantity...). Here is "snippet" of it:
{
"appMsg":"[PS: Success.]",
"primaryProducts":{
"response":{
"numFound":725,
"start":0,
"isExactMatch":true,
"docs":[
{
"name":"Signature SELECT Rice Enriched Long Grain - 5 Lb",
"pid":"126150030",
"upc":"0002113050205",
"id":"126150030",
"featured":false,
"inventoryAvailable":"1",
"pastPurchased":false,
"restrictedValue":"0",
"salesRank":99999,
"price":4.99,
"basePrice":4.99,
"pricePer":1.0,
"displayType":"-1",
"aisleId":"1_6_9_9",
"aisleName":"Rice|1_6_9",
"departmentName":"Grains, Pasta & Sides",
"shelfName":"White Rice",
"unitOfMeasure":"LB",
"sellByWeight":"I",
"averageWeight":[
"0.00"
],
"unitQuantity":"LB",
"displayUnitQuantityText":"ea",
"previousPurchaseQty":0,
"maxPurchaseQty":0,
"prop65WarningIconRequired":false,
"isArProduct":true,
"isMtoProduct":false,
"customizable":false,
"inStoreShoppingElig":false,
"preparationTime":"0",
"isMarketplaceItem":"N",
"triggerQuantity":0,
"channelEligibility":{
"pickUp":true,
"delivery":true,
"inStore":true,
"shipping":false
},
"channelInventory":{
"delivery":"1",
"pickup":"1",
"instore":"1",
"shipping":"0"
},
"productReview":{
"avgRating":"4.8",
"reviewCount":"64",
"isReviewWriteEligible":"true",
"isReviewDisplayEligible":"true",
"isForOnetimeReview":"true",
"reviewTemplateType":"default"
}
}
},
"appCode":"[PS: 200]"
}
# Update:
Despite adding all of the request headers, the response is the same.
Here are all of the headers I added:
headers = {
"Accept": "application/json, text/plain, /",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"DNT": "1",
"Host": "www.jewelosco.com",
"Ocp-Apim-Subscription-Key": "5e790236c84e46338f4290aa1050cdd4",
"Referer": "https://www.jewelosco.com/shop/search-results.html?q=rice",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"TE": "trailers",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0"
}
all_cookies = {
"visid_incap_1990338":"s+w9h0GrTSqb/iWdgj5yGT7p/2MAAAAAQUIPAAAAAAD+pkwygiCfx/ikABjRUg/L",
"nlbi_1990338": "mHC1ApVnlTLFJURPzoaznQAAAACG3swCSzQedoLPtuqqPhlT",
"incap_ses_8080_1990338":"FeLoM/tDE2aUu2sos+0hcD7p/2MAAAAAyClJy0AvAh6rRWqmCVVCcw",
"ECommBanner": "jewelosco",
"abs_gsession":"%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22Selection%22%3A%22user%22%2C%22preference%22%3A%22J4U%22%2C%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%7D%2C%22J4U%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%2C%22SHOP%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%7D%7D",
"SWY_SHARED_SESSION_INFO":"%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%2C%22preference%22%3A%22J4U%22%2C%22Selection%22%3A%22user%22%2C%22userData%22%3A%7B%7D%7D%2C%22J4U%22%3A%7B%22storeId%22%3A%221118%22%2C%22zipcode%22%3A%2252732%22%2C%22userData%22%3A%7B%7D%7D%2C%22SHOP%22%3A%7B%22storeId%22%3A%221118%22%2C%22zipcode%22%3A%2252732%22%2C%22userData%22%3A%7B%7D%7D%7D%7D",
"abs_previouslogin":"%7B%22info%22%3A%7B%22COMMON%22%3A%7B%22Selection%22%3A%22user%22%2C%22preference%22%3A%22J4U%22%2C%22userType%22%3A%22G%22%2C%22zipcode%22%3A%2252732%22%2C%22banner%22%3A%22jewelosco%22%7D%2C%22J4U%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%2C%22SHOP%22%3A%7B%22zipcode%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%7D%7D%7D",
"SWY_SYND_USER_INFO":"%7B%22storeAddress%22%3A%22%22%2C%22storeZip%22%3A%2252732%22%2C%22storeId%22%3A%221118%22%2C%22preference%22%3A%22J4U%22%7D",
"ECommSignInCount": "0",
"SAFEWAY_MODAL_LINK": "",
"OptanonConsent": "isGpcEnabled",
"nlbi_1990338_2147483392":"jYckLK1heGBAHrRyzoaznQAAAACZvsW6rrz3C1oXWBs6UFc8",
"reese84": "3:Gl8qjGMtFKfV15EgMleAnA",
"mbox": "session#2686aefa9dea422db9f92c9b39a01830#1677717696",
"at_check": "true",
"ADRUM_BT": "R:57|i:5124367|g:a106a4d3-bbb8-4619-8262-9d3f98852991652436|e:104|n:safeway-loyalty_d99a98d0-07cc-4871-98b7-0beac77d0580"
}
Here is a function I made that formats a list of cookie-value pairs (e.g `Cookie: "c1=v1; c2=v2; c3=v3"`) into a dictionary where the keys are cookie names and the values are the cookie values--this format is needed to work with `request.get()` :
def format_cookies(cookie_pairs):
'''
Takes a "list" of name-value pairs e.g. "cook1=value1; cook2=val2"
'''
pairs = [pair.split('=') for pair in cookie_pairs.split('; ')]
formatted_pairs = {cookie_val[0]:cookie_val[1] for cookie_val in pairs}
return formatted_pairs
I have altered the original script to reflect these changes.
</details>
# 答案1
**得分**: 1
对于这个网站,你只需要在获取请求的URL参数中获得一个新的请求标识(request-id)。我发现,Cookies 并不重要。
请求标识由19个数字组成。前3个是随机的,中间6个是一个定期更改的服务器时间代码。其余部分是随机的。以下是生成新的请求标识的函数。时间代码似乎是从另一个稳定请求中生成的,与 Cookies 无关。
```python
def get_request_id():
data = {
"checksum": "1640249083",
"_rand": "lrvnwar",
"rid": "r310947",
"d": '{"INQ":{"siteID":10006484,"custID":"-6153494727971827858","scheduleTZs":{}}}',
}
response = requests.post("https://albertsons.inq.com/tagserver/init/initFramework", data=data)
r = response.json()["INQ"]["serverTime"][0:7]
first_chunk = random.randint(100, 999)
second_chunk = r
third_chunk = random.randint(100000000, 999999999)
request_id = int(str(first_chunk) + str(second_chunk) + str(third_chunk)
return request_id
如果你找到了如何从邮政编码中抓取商店ID的方法,请告诉我,因为我目前卡在这一步了
英文:
For this website specifically you just need a fresh request-id in the get request url_params. I've found the cookies don't matter at all.
The request-id is composed of 19 numbers. First 3 are random, middle 6 are a server time code that changes periodically. Remaining are random. This was my function to generate fresh request id. Time code appears to be generated from another stable request where cookies don't matter.
def get_request_id():
data = {
"checksum": "1640249083",
"_rand": "lrvnwar",
"rid": "r310947",
"d": '{"INQ":{"siteID":10006484,"custID":"-6153494727971827858","scheduleTZs":{}}}',
}
response = requests.post("https://albertsons.inq.com/tagserver/init/initFramework", data=data)
r = response.json()["INQ"]["serverTime"][0:7]
first_chunk = random.randint(100, 999)
second_chunk = r
third_chunk = random.randint(100000000, 999999999)
request_id = int(str(first_chunk) + str(second_chunk) + str(third_chunk))
return request_id
If you figure out how to scrape storeid from zip code let me know as that is where I am currently stuck
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论