请求返回“必须提供查询字符串”是因为被抓取时需要提供查询字符串。

huangapple go评论105阅读模式
英文:

Why is request returning must provide query string when scraped?

问题

这是我的当前代码:

  1. dcID = "RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl"
  2. slugcat = "vegetables-1-a0d03d59"
  3. url = "https://www.sayurbox.com/graphql/v1?deduplicate=1"
  4. payload = {
  5. "operationName": "getCartItemCount",
  6. "variables": {"deliveryConfigId": DCId},
  7. "query": "query getCartItemCount($deliveryConfigId: ID!) {\n cart(deliveryConfigId: $deliveryConfigId) {\n id\n count\n __typename\n }\n}"
  8. }, {
  9. "operationName": "getProducts",
  10. "variables": {"deliveryConfigId": DCId, "sortBy": "related_product", "isInstantDelivery": False, "slug": slugcat, "first": 12, "abTestFeatures": []},
  11. "query": "query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) {\n productsByCategoryOrSubcategoryAndDeliveryConfig(\n deliveryConfigId: $deliveryConfigId\n sortBy: $sortBy\n slug: $slug\n after: $after\n first: $first\n isInstantDelivery: $isInstantDelivery\n abTestFeatures: $abTestFeatures\n ) {\n edges {\n node {\n ...ProductInfoFragment\n __typename\n }\n __typename\n }\n pageInfo {\n hasNextPage\n endCursor\n __typename\n }\n productBuilder\n __typename\n }\n}\n\nfragment ProductInfoFragment on Product {\n id\n uuid\n deliveryConfigId\n displayName\n priceRanges\n priceMin\n priceMax\n actualPriceMin\n actualPriceMax\n slug\n label\n isInstant\n isInstantOnly\n nextDayAvailability\n heroImage\n promo\n discount\n isDiscount\n variantType\n imageIds\n isStockAvailable\n defaultVariantSkuCode\n quantitySoldFormatted\n promotion {\n quota\n isShown\n campaignId\n __typename\n }\n productVariants {\n productVariant {\n id\n skuCode\n variantName\n maxQty\n isDiscount\n stockAvailable\n promotion {\n quota\n campaignId\n isShown\n __typename\n }\n __typename\n }\n pageInfo {\n hasPreviousPage\n hasNextPage\n __typename\n }\n __typename\n }\n __typename\n}"
  12. }
  13. response = requests.get(url, headers=headers, json=payload)
  14. response.json()

响应返回如下:

  1. [{'errors': [{'message': 'Must provide query string.', 'extensions': {'timestamp': 1675842901472}}]},
  2. {'errors': [{'message': 'Must provide query string.', 'extensions': {'timestamp': 1675842901472}}]}
  3. ]

我不确定我哪里出错了,因为我已经精确复制了负载和标头。有人可以帮助吗?

英文:

I am trying to scrape https://www.sayurbox.com/category/vegetables-1-a0d03d59?selectedCategoryType=ops&touch_point=screen_CATEGORY_sembako-1-e6a33b51&section_source=shop_list_slider_navigation_category_vegetables-1-a0d03d59

Here's my current code:

  1. dcID="RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl"
  2. slugcat="vegetables-1-a0d03d59"
  3. url="https://www.sayurbox.com/graphql/v1?deduplicate=1"
  4. payload={"operationName":"getCartItemCount",
  5. "variables":{"deliveryConfigId":DCId},
  6. "query":"query getCartItemCount($deliveryConfigId: ID!) {\n cart(deliveryConfigId: $deliveryConfigId) {\n id\n count\n __typename\n }\n}"},{"operationName":"getProducts",
  7. "variables":{"deliveryConfigId":DCId,
  8. "sortBy":"related_product",
  9. "isInstantDelivery":False,
  10. "slug":slugcat,
  11. "first":12,
  12. "abTestFeatures":[]},
  13. "query":"query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) {\n productsByCategoryOrSubcategoryAndDeliveryConfig(\n deliveryConfigId: $deliveryConfigId\n sortBy: $sortBy\n slug: $slug\n after: $after\n first: $first\n isInstantDelivery: $isInstantDelivery\n abTestFeatures: $abTestFeatures\n ) {\n edges {\n node {\n ...ProductInfoFragment\n __typename\n }\n __typename\n }\n pageInfo {\n hasNextPage\n endCursor\n __typename\n }\n productBuilder\n __typename\n }\n}\n\nfragment ProductInfoFragment on Product {\n id\n uuid\n deliveryConfigId\n displayName\n priceRanges\n priceMin\n priceMax\n actualPriceMin\n actualPriceMax\n slug\n label\n isInstant\n isInstantOnly\n nextDayAvailability\n heroImage\n promo\n discount\n isDiscount\n variantType\n imageIds\n isStockAvailable\n defaultVariantSkuCode\n quantitySoldFormatted\n promotion {\n quota\n isShown\n campaignId\n __typename\n }\n productVariants {\n productVariant {\n id\n skuCode\n variantName\n maxQty\n isDiscount\n stockAvailable\n promotion {\n quota\n campaignId\n isShown\n __typename\n }\n __typename\n }\n pageInfo {\n hasPreviousPage\n hasNextPage\n __typename\n }\n __typename\n }\n __typename\n}"}
  14. response=requests.get(url,headers=headers,json=payload)
  15. response.json()

The response returns

  1. [{'errors': [{'message': 'Must provide query string.',
  2. 'extensions': {'timestamp': 1675842901472}}]},
  3. {'errors': [{'message': 'Must provide query string.',
  4. 'extensions': {'timestamp': 1675842901472}}]}]

I am not sure where I went wrong, as I've copied the payload and headers exactly. Can someone help?

答案1

得分: 1

Get请求通常不应该有负载。我认为这只是你尝试提供的查询参数。尝试将json参数更改为params。https://www.w3schools.com/python/ref_requests_get.asp

英文:

Get requests generally shouldn't have a payload. I think these are just query parameters you're trying to supply. Try changing the <strike>payload </strike> json argument to params. https://www.w3schools.com/python/ref_requests_get.asp

答案2

得分: 1

首先,请求应该是POST而不是GET。其次,我认为您不应该使用"getCartItemCount",而应该使用"getProducts"。

  1. DCId = 'RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl'
  2. slugcat = 'vegetables-1-a0d03d59'
  3. url = 'https://www.sayurbox.com/graphql/v1?deduplicate=1'
  4. payload = {
  5. 'operationName': 'getProducts',
  6. 'variables': {
  7. 'deliveryConfigId': DCId,
  8. 'sortBy': 'related_product',
  9. 'isInstantDelivery': False,
  10. 'slug': slugcat,
  11. 'first': 12,
  12. 'abTestFeatures': ['category-page-subcategory-section-v5#####control']
  13. },
  14. 'query': 'query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) { productsByCategoryOrSubcategoryAndDeliveryConfig( deliveryConfigId: $deliveryConfigId sortBy: $sortBy slug: $slug after: $after first: $first isInstantDelivery: $isInstantDelivery abTestFeatures: $abTestFeatures ) { edges { node { ...ProductInfoFragment __typename } __typename } pageInfo { hasNextPage endCursor __typename } productBuilder __typename }}fragment ProductInfoFragment on Product { id uuid deliveryConfigId displayName priceRanges priceMin priceMax actualPriceMin actualPriceMax slug label isInstant isInstantOnly nextDayAvailability heroImage promo discount isDiscount variantType imageIds isStockAvailable defaultVariantSkuCode quantitySoldFormatted promotion { quota isShown campaignId __typename } productVariants { productVariant { id skuCode variantName maxQty isDiscount stockAvailable promotion { quota campaignId isShown __typename } __typename } pageInfo { hasPreviousPage hasNextPage __typename } __typename}'
  15. }
  16. response = requests.post(url, headers=headers, json=payload1)
  17. data = response.json()

输出(使用Pandas):

  1. import pandas as pd
  2. df = pd.json_normalize([node['node'] for node in data['data']['productsByCategoryOrSubcategoryAndDeliveryConfig']['edges']])
  1. >>> df
  2. id uuid ... productVariants.pageInfo.__typename productVariants.__typename
  3. 0 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 479c7805-3b26-4bb9-93b9-5689a2d3bb9d ... PageInfo productVariantsConnection
  4. 1 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ba7154a1-e784-451d-88e0-10ede13d55b3 ... PageInfo productVariantsConnection
  5. 2 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 5e023650-50fa-4adc-800d-be14cac7f1eb ... PageInfo productVariantsConnection
  6. 3 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... eec5c6fa-70b9-45d8-a316-6820d1ed68c3 ... PageInfo productVariantsConnection
  7. 4 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ee1a0910-f021-48e4-a8d0-ab54f4358bde ... PageInfo productVariantsConnection
  8. 5 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 17dccf7a-0763-4c34-a537-7b746bdba683 ... PageInfo productVariantsConnection
  9. 6 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 90bbee6d-184e-4d8b-8702-77b660883a00 ... PageInfo productVariantsConnection
  10. 7 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... f7e51319-0dd3-4c21-9bba-bc8e3f71db94 ... PageInfo productVariantsConnection
  11. 8 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 9f889a62-9302-48db-a972-cff035440ee4 ... PageInfo productVariantsConnection
  12. 9 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... dd58f053-238f-45f6-b937-687c1e1db3b0 ... PageInfo productVariantsConnection
  13. 10 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 05c37b4e-cf0f-4cf5-a9a8-20ea00029063 ... PageInfo productVariantsConnection
  14. 11 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... e559850a-2344-4bb4-be70-932214aace91 ... PageInfo productVariantsConnection
  15. [12 rows x 30 columns]
  16. >>> df.info()
  17. <class 'pandas.core.frame.DataFrame'>
  18. RangeIndex: 12 entries, 0 to 11
  19. Data columns (total 30 columns):
  20. # Column Non-Null Count Dtype
  21. --- ------ -------------- -----
  22. 0 id 12 non-null object
  23. 1 uuid 12 non-null object
  24. 2 deliveryConfigId 12 non-null object
  25. 3 displayName 12 non-null object
  26. 4 priceRanges 12 non-null object
  27. 5 priceMin 12 non-null
  28. <details>
  29. <summary>英文:</summary>
  30. First, the request should be a POST and not a GET. Second thing, I think you don&#39;t want to operate on &quot;getCartItemCount&quot; but probably on &quot;getProducts&quot;.

DCId = 'RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl'
slugcat = 'vegetables-1-a0d03d59'
url = 'https://www.sayurbox.com/graphql/v1?deduplicate=1'

payload = {
'operationName': 'getProducts',
'variables': {
'deliveryConfigId': DCId,
'sortBy': 'related_product',
'isInstantDelivery': False,
'slug': slugcat,
'first': 12,
'abTestFeatures': ['category-page-subcategory-section-v5#####control']
},
'query': 'query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) { productsByCategoryOrSubcategoryAndDeliveryConfig( deliveryConfigId: $deliveryConfigId sortBy: $sortBy slug: $slug after: $after first: $first isInstantDelivery: $isInstantDelivery abTestFeatures: $abTestFeatures ) { edges { node { ...ProductInfoFragment __typename } __typename } pageInfo { hasNextPage endCursor __typename } productBuilder __typename }}fragment ProductInfoFragment on Product { id uuid deliveryConfigId displayName priceRanges priceMin priceMax actualPriceMin actualPriceMax slug label isInstant isInstantOnly nextDayAvailability heroImage promo discount isDiscount variantType imageIds isStockAvailable defaultVariantSkuCode quantitySoldFormatted promotion { quota isShown campaignId __typename } productVariants { productVariant { id skuCode variantName maxQty isDiscount stockAvailable promotion { quota campaignId isShown __typename } __typename } pageInfo { hasPreviousPage hasNextPage __typename } __typename } __typename}'}

response = requests.post(url, headers=headers, json=payload1)
data = response.json()

  1. Output (with Pandas):

import pandas as pd

df = pd.json_normalize([node['node'] for node in data['data']['productsByCategoryOrSubcategoryAndDeliveryConfig']['edges']])

>>> df
id uuid ... productVariants.pageInfo.__typename productVariants.__typename
0 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 479c7805-3b26-4bb9-93b9-5689a2d3bb9d ... PageInfo productVariantsConnection
1 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ba7154a1-e784-451d-88e0-10ede13d55b3 ... PageInfo productVariantsConnection
2 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 5e023650-50fa-4adc-800d-be14cac7f1eb ... PageInfo productVariantsConnection
3 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... eec5c6fa-70b9-45d8-a316-6820d1ed68c3 ... PageInfo productVariantsConnection
4 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ee1a0910-f021-48e4-a8d0-ab54f4358bde ... PageInfo productVariantsConnection
5 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 17dccf7a-0763-4c34-a537-7b746bdba683 ... PageInfo productVariantsConnection
6 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 90bbee6d-184e-4d8b-8702-77b660883a00 ... PageInfo productVariantsConnection
7 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... f7e51319-0dd3-4c21-9bba-bc8e3f71db94 ... PageInfo productVariantsConnection
8 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 9f889a62-9302-48db-a972-cff035440ee4 ... PageInfo productVariantsConnection
9 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... dd58f053-238f-45f6-b937-687c1e1db3b0 ... PageInfo productVariantsConnection
10 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 05c37b4e-cf0f-4cf5-a9a8-20ea00029063 ... PageInfo productVariantsConnection
11 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... e559850a-2344-4bb4-be70-932214aace91 ... PageInfo productVariantsConnection

[12 rows x 30 columns]

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 30 columns):

Column Non-Null Count Dtype


0 id 12 non-null object
1 uuid 12 non-null object
2 deliveryConfigId 12 non-null object
3 displayName 12 non-null object
4 priceRanges 12 non-null object
5 priceMin 12 non-null int64
6 priceMax 12 non-null int64
7 actualPriceMin 12 non-null int64
8 actualPriceMax 12 non-null int64
9 slug 12 non-null object
10 label 0 non-null object
11 isInstant 12 non-null bool
12 isInstantOnly 12 non-null bool
13 nextDayAvailability 12 non-null bool
14 heroImage 12 non-null object
15 promo 12 non-null object
16 discount 12 non-null object
17 isDiscount 12 non-null bool
18 variantType 12 non-null object
19 imageIds 12 non-null object
20 isStockAvailable 12 non-null bool
21 defaultVariantSkuCode 12 non-null object
22 quantitySoldFormatted 12 non-null object
23 promotion 0 non-null object
24 __typename 12 non-null object
25 productVariants.productVariant 12 non-null object
26 productVariants.pageInfo.hasPreviousPage 12 non-null bool
27 productVariants.pageInfo.hasNextPage 12 non-null bool
28 productVariants.pageInfo.__typename 12 non-null object
29 productVariants.__typename 12 non-null object
dtypes: bool(7), int64(4), object(19)
memory usage: 2.4+ KB

  1. </details>

huangapple
  • 本文由 发表于 2023年2月8日 16:00:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382794.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定