请求返回“必须提供查询字符串”是因为被抓取时需要提供查询字符串。

huangapple go评论76阅读模式
英文:

Why is request returning must provide query string when scraped?

问题

这是我的当前代码:

dcID = "RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl"
slugcat = "vegetables-1-a0d03d59"
url = "https://www.sayurbox.com/graphql/v1?deduplicate=1"

payload = {
    "operationName": "getCartItemCount",
    "variables": {"deliveryConfigId": DCId},
    "query": "query getCartItemCount($deliveryConfigId: ID!) {\n  cart(deliveryConfigId: $deliveryConfigId) {\n    id\n    count\n    __typename\n  }\n}"
}, {
    "operationName": "getProducts",
    "variables": {"deliveryConfigId": DCId, "sortBy": "related_product", "isInstantDelivery": False, "slug": slugcat, "first": 12, "abTestFeatures": []},
    "query": "query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) {\n  productsByCategoryOrSubcategoryAndDeliveryConfig(\n    deliveryConfigId: $deliveryConfigId\n    sortBy: $sortBy\n    slug: $slug\n    after: $after\n    first: $first\n    isInstantDelivery: $isInstantDelivery\n    abTestFeatures: $abTestFeatures\n  ) {\n    edges {\n      node {\n        ...ProductInfoFragment\n        __typename\n      }\n      __typename\n    }\n    pageInfo {\n      hasNextPage\n      endCursor\n      __typename\n    }\n    productBuilder\n    __typename\n  }\n}\n\nfragment ProductInfoFragment on Product {\n  id\n  uuid\n  deliveryConfigId\n  displayName\n  priceRanges\n  priceMin\n  priceMax\n  actualPriceMin\n  actualPriceMax\n  slug\n  label\n  isInstant\n  isInstantOnly\n  nextDayAvailability\n  heroImage\n  promo\n  discount\n  isDiscount\n  variantType\n  imageIds\n  isStockAvailable\n  defaultVariantSkuCode\n  quantitySoldFormatted\n  promotion {\n    quota\n    isShown\n    campaignId\n    __typename\n  }\n  productVariants {\n    productVariant {\n      id\n      skuCode\n      variantName\n      maxQty\n      isDiscount\n      stockAvailable\n      promotion {\n        quota\n        campaignId\n        isShown\n        __typename\n      }\n      __typename\n    }\n    pageInfo {\n      hasPreviousPage\n      hasNextPage\n      __typename\n    }\n    __typename\n  }\n  __typename\n}"
}

response = requests.get(url, headers=headers, json=payload)
response.json()

响应返回如下:

[{'errors': [{'message': 'Must provide query string.', 'extensions': {'timestamp': 1675842901472}}]},
 {'errors': [{'message': 'Must provide query string.', 'extensions': {'timestamp': 1675842901472}}]}
]

我不确定我哪里出错了,因为我已经精确复制了负载和标头。有人可以帮助吗?

英文:

I am trying to scrape https://www.sayurbox.com/category/vegetables-1-a0d03d59?selectedCategoryType=ops&touch_point=screen_CATEGORY_sembako-1-e6a33b51&section_source=shop_list_slider_navigation_category_vegetables-1-a0d03d59

Here's my current code:

dcID="RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl"
slugcat="vegetables-1-a0d03d59"
url="https://www.sayurbox.com/graphql/v1?deduplicate=1"
payload={"operationName":"getCartItemCount",
"variables":{"deliveryConfigId":DCId},
"query":"query getCartItemCount($deliveryConfigId: ID!) {\n  cart(deliveryConfigId: $deliveryConfigId) {\n    id\n    count\n    __typename\n  }\n}"},{"operationName":"getProducts",
"variables":{"deliveryConfigId":DCId,
"sortBy":"related_product",
"isInstantDelivery":False,
"slug":slugcat,
"first":12,
"abTestFeatures":[]},
"query":"query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) {\n  productsByCategoryOrSubcategoryAndDeliveryConfig(\n    deliveryConfigId: $deliveryConfigId\n    sortBy: $sortBy\n    slug: $slug\n    after: $after\n    first: $first\n    isInstantDelivery: $isInstantDelivery\n    abTestFeatures: $abTestFeatures\n  ) {\n    edges {\n      node {\n        ...ProductInfoFragment\n        __typename\n      }\n      __typename\n    }\n    pageInfo {\n      hasNextPage\n      endCursor\n      __typename\n    }\n    productBuilder\n    __typename\n  }\n}\n\nfragment ProductInfoFragment on Product {\n  id\n  uuid\n  deliveryConfigId\n  displayName\n  priceRanges\n  priceMin\n  priceMax\n  actualPriceMin\n  actualPriceMax\n  slug\n  label\n  isInstant\n  isInstantOnly\n  nextDayAvailability\n  heroImage\n  promo\n  discount\n  isDiscount\n  variantType\n  imageIds\n  isStockAvailable\n  defaultVariantSkuCode\n  quantitySoldFormatted\n  promotion {\n    quota\n    isShown\n    campaignId\n    __typename\n  }\n  productVariants {\n    productVariant {\n      id\n      skuCode\n      variantName\n      maxQty\n      isDiscount\n      stockAvailable\n      promotion {\n        quota\n        campaignId\n        isShown\n        __typename\n      }\n      __typename\n    }\n    pageInfo {\n      hasPreviousPage\n      hasNextPage\n      __typename\n    }\n    __typename\n  }\n  __typename\n}"}
response=requests.get(url,headers=headers,json=payload)
response.json()

The response returns

[{'errors': [{'message': 'Must provide query string.',
'extensions': {'timestamp': 1675842901472}}]},
{'errors': [{'message': 'Must provide query string.',
'extensions': {'timestamp': 1675842901472}}]}]

I am not sure where I went wrong, as I've copied the payload and headers exactly. Can someone help?

答案1

得分: 1

Get请求通常不应该有负载。我认为这只是你尝试提供的查询参数。尝试将json参数更改为params。https://www.w3schools.com/python/ref_requests_get.asp

英文:

Get requests generally shouldn't have a payload. I think these are just query parameters you're trying to supply. Try changing the <strike>payload </strike> json argument to params. https://www.w3schools.com/python/ref_requests_get.asp

答案2

得分: 1

首先,请求应该是POST而不是GET。其次,我认为您不应该使用"getCartItemCount",而应该使用"getProducts"。

DCId = 'RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl'
slugcat = 'vegetables-1-a0d03d59'
url = 'https://www.sayurbox.com/graphql/v1?deduplicate=1'

payload = {
    'operationName': 'getProducts',
    'variables': {
        'deliveryConfigId': DCId,
        'sortBy': 'related_product',
        'isInstantDelivery': False,
        'slug': slugcat,
        'first': 12,
        'abTestFeatures': ['category-page-subcategory-section-v5#####control']
    },
    'query': 'query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) { productsByCategoryOrSubcategoryAndDeliveryConfig( deliveryConfigId: $deliveryConfigId sortBy: $sortBy slug: $slug after: $after first: $first isInstantDelivery: $isInstantDelivery abTestFeatures: $abTestFeatures ) { edges { node { ...ProductInfoFragment __typename } __typename } pageInfo { hasNextPage endCursor __typename } productBuilder __typename }}fragment ProductInfoFragment on Product { id uuid deliveryConfigId displayName priceRanges priceMin priceMax actualPriceMin actualPriceMax slug label isInstant isInstantOnly nextDayAvailability heroImage promo discount isDiscount variantType imageIds isStockAvailable defaultVariantSkuCode quantitySoldFormatted promotion { quota isShown campaignId __typename } productVariants { productVariant { id skuCode variantName maxQty isDiscount stockAvailable promotion { quota campaignId isShown __typename } __typename } pageInfo { hasPreviousPage hasNextPage __typename } __typename}'
}

response = requests.post(url, headers=headers, json=payload1)
data = response.json()

输出(使用Pandas):

import pandas as pd

df = pd.json_normalize([node['node'] for node in data['data']['productsByCategoryOrSubcategoryAndDeliveryConfig']['edges']])
>>> df
                                                   id                                  uuid  ... productVariants.pageInfo.__typename productVariants.__typename
0   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  479c7805-3b26-4bb9-93b9-5689a2d3bb9d  ...                            PageInfo  productVariantsConnection
1   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  ba7154a1-e784-451d-88e0-10ede13d55b3  ...                            PageInfo  productVariantsConnection
2   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  5e023650-50fa-4adc-800d-be14cac7f1eb  ...                            PageInfo  productVariantsConnection
3   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  eec5c6fa-70b9-45d8-a316-6820d1ed68c3  ...                            PageInfo  productVariantsConnection
4   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  ee1a0910-f021-48e4-a8d0-ab54f4358bde  ...                            PageInfo  productVariantsConnection
5   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  17dccf7a-0763-4c34-a537-7b746bdba683  ...                            PageInfo  productVariantsConnection
6   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  90bbee6d-184e-4d8b-8702-77b660883a00  ...                            PageInfo  productVariantsConnection
7   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  f7e51319-0dd3-4c21-9bba-bc8e3f71db94  ...                            PageInfo  productVariantsConnection
8   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  9f889a62-9302-48db-a972-cff035440ee4  ...                            PageInfo  productVariantsConnection
9   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  dd58f053-238f-45f6-b937-687c1e1db3b0  ...                            PageInfo  productVariantsConnection
10  UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  05c37b4e-cf0f-4cf5-a9a8-20ea00029063  ...                            PageInfo  productVariantsConnection
11  UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  e559850a-2344-4bb4-be70-932214aace91  ...                            PageInfo  productVariantsConnection

[12 rows x 30 columns]


>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 30 columns):
 #   Column                                    Non-Null Count  Dtype 
---  ------                                    --------------  ----- 
 0   id                                        12 non-null     object
 1   uuid                                      12 non-null     object
 2   deliveryConfigId                          12 non-null     object
 3   displayName                               12 non-null     object
 4   priceRanges                               12 non-null     object
 5   priceMin                                  12 non-null    

<details>
<summary>英文:</summary>

First, the request should be a POST and not a GET. Second thing, I think you don&#39;t want to operate on &quot;getCartItemCount&quot; but probably on &quot;getProducts&quot;.

DCId = 'RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl'
slugcat = 'vegetables-1-a0d03d59'
url = 'https://www.sayurbox.com/graphql/v1?deduplicate=1'

payload = {
'operationName': 'getProducts',
'variables': {
'deliveryConfigId': DCId,
'sortBy': 'related_product',
'isInstantDelivery': False,
'slug': slugcat,
'first': 12,
'abTestFeatures': ['category-page-subcategory-section-v5#####control']
},
'query': 'query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) { productsByCategoryOrSubcategoryAndDeliveryConfig( deliveryConfigId: $deliveryConfigId sortBy: $sortBy slug: $slug after: $after first: $first isInstantDelivery: $isInstantDelivery abTestFeatures: $abTestFeatures ) { edges { node { ...ProductInfoFragment __typename } __typename } pageInfo { hasNextPage endCursor __typename } productBuilder __typename }}fragment ProductInfoFragment on Product { id uuid deliveryConfigId displayName priceRanges priceMin priceMax actualPriceMin actualPriceMax slug label isInstant isInstantOnly nextDayAvailability heroImage promo discount isDiscount variantType imageIds isStockAvailable defaultVariantSkuCode quantitySoldFormatted promotion { quota isShown campaignId __typename } productVariants { productVariant { id skuCode variantName maxQty isDiscount stockAvailable promotion { quota campaignId isShown __typename } __typename } pageInfo { hasPreviousPage hasNextPage __typename } __typename } __typename}'}

response = requests.post(url, headers=headers, json=payload1)
data = response.json()


Output (with Pandas):

import pandas as pd

df = pd.json_normalize([node['node'] for node in data['data']['productsByCategoryOrSubcategoryAndDeliveryConfig']['edges']])


>>> df
id uuid ... productVariants.pageInfo.__typename productVariants.__typename
0 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 479c7805-3b26-4bb9-93b9-5689a2d3bb9d ... PageInfo productVariantsConnection
1 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ba7154a1-e784-451d-88e0-10ede13d55b3 ... PageInfo productVariantsConnection
2 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 5e023650-50fa-4adc-800d-be14cac7f1eb ... PageInfo productVariantsConnection
3 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... eec5c6fa-70b9-45d8-a316-6820d1ed68c3 ... PageInfo productVariantsConnection
4 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ee1a0910-f021-48e4-a8d0-ab54f4358bde ... PageInfo productVariantsConnection
5 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 17dccf7a-0763-4c34-a537-7b746bdba683 ... PageInfo productVariantsConnection
6 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 90bbee6d-184e-4d8b-8702-77b660883a00 ... PageInfo productVariantsConnection
7 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... f7e51319-0dd3-4c21-9bba-bc8e3f71db94 ... PageInfo productVariantsConnection
8 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 9f889a62-9302-48db-a972-cff035440ee4 ... PageInfo productVariantsConnection
9 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... dd58f053-238f-45f6-b937-687c1e1db3b0 ... PageInfo productVariantsConnection
10 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 05c37b4e-cf0f-4cf5-a9a8-20ea00029063 ... PageInfo productVariantsConnection
11 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... e559850a-2344-4bb4-be70-932214aace91 ... PageInfo productVariantsConnection

[12 rows x 30 columns]

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 30 columns):

Column Non-Null Count Dtype


0 id 12 non-null object
1 uuid 12 non-null object
2 deliveryConfigId 12 non-null object
3 displayName 12 non-null object
4 priceRanges 12 non-null object
5 priceMin 12 non-null int64
6 priceMax 12 non-null int64
7 actualPriceMin 12 non-null int64
8 actualPriceMax 12 non-null int64
9 slug 12 non-null object
10 label 0 non-null object
11 isInstant 12 non-null bool
12 isInstantOnly 12 non-null bool
13 nextDayAvailability 12 non-null bool
14 heroImage 12 non-null object
15 promo 12 non-null object
16 discount 12 non-null object
17 isDiscount 12 non-null bool
18 variantType 12 non-null object
19 imageIds 12 non-null object
20 isStockAvailable 12 non-null bool
21 defaultVariantSkuCode 12 non-null object
22 quantitySoldFormatted 12 non-null object
23 promotion 0 non-null object
24 __typename 12 non-null object
25 productVariants.productVariant 12 non-null object
26 productVariants.pageInfo.hasPreviousPage 12 non-null bool
27 productVariants.pageInfo.hasNextPage 12 non-null bool
28 productVariants.pageInfo.__typename 12 non-null object
29 productVariants.__typename 12 non-null object
dtypes: bool(7), int64(4), object(19)
memory usage: 2.4+ KB


</details>

huangapple
  • 本文由 发表于 2023年2月8日 16:00:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382794.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定