问题

I'm facing some unexpected issues while trying to get data from an API request. I found that it is throwing a "500" error and also this error message. I'm trying to scrape this URL "https://www.machinerytrader.com/listings/for-sale/excavators/1031" but I have no idea what I actually missing here.

raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 2)

This is what I tried so far,

import scrapy
import json

class ListingSpider(scrapy.Spider):
name = 'listing'
allowed_domains = ['www.machinerytrader.com']
# start_urls = ['https://www.machinerytrader.com/listings/for-sale/excavators/1031']

def start_requests(self):
    payload = {
        "Category": "1031",
        "sort": "1",
        "page": "2"
        }
    headers= {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
        "authority": "www.machinerytrader.com",
        "method": "GET",
        "path": "/ajax/listings/ajaxsearch?Category=1031&amp;sort=1&amp;page=2",
        "scheme": "https",
        "accept": "application/json, text/plain, */*",
        "accept-encoding": "gzip, deflate, br",
        "cache-control": "no-cache",
        "content-type": "application/json",
        "cookie": "ASP.NET_SessionId=uircx3p1up0gs3we43zfy3xp; Tracking=SessionStarted=1&amp;UserReferrer=&amp;GUID=75541984850425423381; __RequestVerificationToken=CqqhcuoUxcCh_VvGb2QkZPTMG1sygAxcDjWmGutWxYGvIScO7I1rCwBZabShMlyTl9syCA2; BIGipServerwww.machinery_tradesites_http_pool=545368256.20480.0000; ln_or=eyI4254892IjoiZCJ9; AMP_TOKEN=%24NOT_FOUND; _gid=GA1.2.104780578.1678372555; _fbp=fb.1.1678372557791.1218325782; _hjFirstSeen=1; _hjIncludedInSessionSample_1143836=1; _hjSession_1143836=eyJpZCI6IjU1ZGYyOGJf7b4b:T=1678372561:S=ALNI_MarB5bgIDdhzpQPECKDmX-70INeJg; _hjIncludedInPageviewSample=1; _hjAbsoluteSessionInProgress=1; __gads=ID=2a38a4e969861bb1:T=1678372561:S=ALNI_MapbcTx6haLt65wewjezZyeMFVtCw; __gpi=UID=00000bd5f25a7b4b:T=1678372561:RT=1678372561:S=ALNI_MapbcTx6haLt65wewjezZyeMFVtCw; _hjSessionUser_1143836=eyJpZCI6IjI3ODM6bL6VUYdJDEhmyR8OBJsHuH4BwOdjnbr7pFG+o4AZKqDHliWKhUnDxGAHIhKwzhhq5TFjeJbqRwSLrMXH54WxZZHcuvRtwr734U2F3Pmf8NqW+zavYdB/aYrk+HpA9LfQQQFBGd/1FNRAM0e8fxZpj5U/DxTKPMdvwK5qBnfzQaTzycDwe80G7QRYX9kf4=:gQlue37nFKz2zVkiSWGW9vURldmXHHEIxHz2yiUrtF8=; _uetsid=b29154b0be8711edabac07f5e20bba65; _uetvid=b2917380be8711edbb2587449262974a; _ga=GA1.2.1777003444.1678372555; UserID=ID=n2K08gLg7XRct%2fxxJeWEnGDwbYpUh6vQ%2fiE1eBN%2f25lkMV4lKXFpeoTT54DrUsz9CriJRnchYL4PfEPzqxaCRA%3d%3d&amp;LV=sHEWMHROf%2fDobQFZDWX1nWtv%2bf2Uk9i6YA9N5Sk0lGE%2fWiDudiekp7MPDIUnH0jGKMx9VZbhLzD4VuT7pKbqepCdPLaN274I; UserSettingsCookie=screenSize=1246|947; _ga_27QWK2FVDW=GS1.1.1678372557.1.1.1678373273.60.0.0",
        "pragma": "no-cache",
        "referer": "https://www.machinerytrader.com/listings/for-sale/excavators/1031",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "Windows",
        "sec-fetch-dest": "empty",
        "sec-fetch-mode": "cors",
        "sec-fetch-site": "same-origin",
        "x-xsrf-token": "lKwor8adm67mnDJjariTC1-_x2sWvmjxDtVZerZ6p03OwqvVc10YVZUQMmD4-pTv7E2cTSN-8rsTW6ISckmZVgBek66eHw3iFUngI3jYt6h_rwqQ3pI_QxPjYH1us7eHyW27lxFL_-wSS3QC0",
        "sec-ch-ua": '"Google Chrome";v="111", "Not(A:Brand";v="8", "Chromium";v="111"',
        }

    yield scrapy.Request(
            url="https://www.machinerytrader.com/ajax/listings/ajaxsearch?Category=1031&amp;sort=1&amp;page=2",
            method="GET",
            headers=headers,
            body=json.dumps(payload),
            callback=self.parse
            )

def parse(self, response):
    json_resp = json.loads(response.body)
    products = json_resp['Listings']

    yield {
        'DealerLocation': products['DealerLocation'],

英文:

raise JSONDecodeError(&quot;Expecting value&quot;, s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 2)

This is what I tried so far,

import scrapy
import json


class ListingSpider(scrapy.Spider):
    name = &#39;listing&#39;
    allowed_domains = [&#39;www.machinerytrader.com&#39;]
    # start_urls = [&#39;https://www.machinerytrader.com/listings/for-sale/excavators/1031&#39;]

    def start_requests(self):
        payload = {
            &quot;Category&quot;:&quot;1031&quot;,
            &quot;sort&quot;: &quot;1&quot;,
            &quot;page&quot;:&quot;2&quot;
            }
        headers= {
            &quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36&quot;,
            &quot;authority&quot;: &quot;www.machinerytrader.com&quot;,
            &quot;method&quot;: &quot;GET&quot;,
            &quot;path&quot;: &quot;/ajax/listings/ajaxsearch?Category=1031&amp;sort=1&amp;page=2&quot;,
            &quot;scheme&quot;: &quot;https&quot;,
            &quot;accept&quot;: &quot;application/json, text/plain, */*&quot;,
            &quot;accept-encoding&quot;: &quot;gzip, deflate, br&quot;,
            &quot;cache-control&quot;: &quot;no-cache&quot;,
            &quot;content-type&quot;: &quot;application/json&quot;,
            &quot;cookie&quot;: &quot;ASP.NET_SessionId=uircx3p1up0gs3we43zfy3xp; Tracking=SessionStarted=1&amp;UserReferrer=&amp;GUID=75541984850425423381; __RequestVerificationToken=CqqhcuoUxcCh_VvGb2QkZPTMG1sygAxcDjWmGutWxYGvIScO7I1rCwBZabShMlyTl9syCA2; BIGipServerwww.machinery_tradesites_http_pool=545368256.20480.0000; ln_or=eyI0MjU0ODkyIjoiZCJ9; AMP_TOKEN=%24NOT_FOUND; _gid=GA1.2.104780578.1678372555; _fbp=fb.1.1678372557791.1218325782; _hjFirstSeen=1; _hjIncludedInSessionSample_1143836=1; _hjSession_1143836=eyJpZCI6IjU1ZGYyOGJmLWQ4YjktNGU2Mi04NjU2LWUwYmJkYzdiNGMxMSIsImNyZWF0ZWQiOjE2NzgzNzI1NTgzNDcsImluU2FtcGxlIjp0cnVlfQ==; _hjIncludedInPageviewSample=1; _hjAbsoluteSessionInProgress=1; __gads=ID=2a38a4e969861bb1:T=1678372561:S=ALNI_MarB5bgIDdhzpQPECKDmX-70INeJg; __gpi=UID=00000bd5f25a7b4b:T=1678372561:RT=1678372561:S=ALNI_MapbcTx6haLt65wewjezZyeMFVtCw; _hjSessionUser_1143836=eyJpZCI6IjI3ODM2ZDdhLTc0YzUtNTIwMi05YjdhLWYxMmM5YTk4ZGNmNiIsImNyZWF0ZWQiOjE2NzgzNzI1NTgzMzksImV4aXN0aW5nIjp0cnVlfQ==; __atuvc=2%7C10; __atuvs=6409eecd9390ad6d001; Top_PopUp=true; reese84=3:MhdsyFtuLMcDbPfjHYfnUQ==:Xnyj2+4WPTbNOTnv4Aj99+6mLrSjYnrQVoSGqCJEwqmN/gdPfQuCPFYN1/1sInEQHaUvLNdN2VbgdxeC96k6tr1MUSbHd2GxI4AKb1CxnkZfLm63/CXWNqJ/vlS66hOTSsEn+gxPb2l3g2TD3RGi0H4PjyhskjDIE10USkPi3mm83aG/xkAYL4khuWtRDaYzyHjzQ76f9yRr0tNnEEbUPbxZTW7BPXcEF606e6mzq6v5/YEy17JScccw/CCkXb4Uv1tzeNYhkMuFj5V5upY0a2tC/MiJeCACNCYnX9obZhGsfPbL6VUYdJDEhmyR8OBJsHuH4BwOdjnbr7pFG+o4AZKqDHliWKhUnDxGAHIhKwzhhq5TFjeJbqRwSLrMXH54WxZZHcuvRtwr734U2F3Pmf8NqW+zavYdB/aYrk+HpA9LfQQQFBGd/1FNRAM0e8fxZpj5U/DxTKPMdvwK5qBnfzQaTzycDwe80G7QRYX9kf4=:gQlue37nFKz2zVkiSWGW9vURldmXHHEIxHz2yiUrtF8=; _uetsid=b29154b0be8711edabac07f5e20bba65; _uetvid=b2917380be8711edbb2587449262974a; _ga=GA1.2.1777003444.1678372555; UserID=ID=n2K08gLg7XRct%2fxxJeWEnGDwbYpUh6vQ%2fiE1eBN%2f25lkMV4lKXFpeoTT54DrUsz9CriJRnchYL4PfEPzqxaCRA%3d%3d&amp;LV=sHEWMHROf%2fDobQFZDWX1nWtv%2bf2Uk9i6YA9N5Sk0lGE%2fWiDudiekp7MPDIUnH0jGKMx9VZbhLzD4VuT7pKbqepCdPLaN274I; UserSettingsCookie=screenSize=1246|947; _ga_27QWK2FVDW=GS1.1.1678372557.1.1.1678373273.60.0.0&quot;,
            &quot;pragma&quot;: &quot;no-cache&quot;,
            &quot;referer&quot;: &quot;https://www.machinerytrader.com/listings/for-sale/excavators/1031&quot;,
            &quot;sec-ch-ua-mobile&quot;: &quot;?0&quot;,
            &quot;sec-ch-ua-platform&quot;: &quot;Windows&quot;,
            &quot;sec-fetch-dest&quot;: &quot;empty&quot;,
            &quot;sec-fetch-mode&quot;: &quot;cors&quot;,
            &quot;sec-fetch-site&quot;: &quot;same-origin&quot;,
            &quot;x-xsrf-token&quot;: &quot;lKwor8adm67mnDJjariTC1-_x2sWvmjxDtVZerZ6p03OwqvVc10YVZUQMmD4-pTv7E2cTSN-8rsTW6ISckmZVgBek66eHw3iFUngI3jYt6h_rwqQ3pI_QxPjYH1us7eHyW27lxFL_-wSS3QC0&quot;,
            &quot;sec-ch-ua&quot;: &#39;&quot;Google Chrome&quot;;v=&quot;111&quot;, &quot;Not(A:Brand&quot;;v=&quot;8&quot;, &quot;Chromium&quot;;v=&quot;111&quot;&#39;,
            }

        yield scrapy.Request(
                url=&quot;https://www.machinerytrader.com/ajax/listings/ajaxsearch?Category=1031&amp;sort=1&amp;page=2&quot;,
                method=&quot;GET&quot;,
                headers=headers,
                body=json.dumps(payload),
                callback=self.parse
                )

    def parse(self, response):
        json_resp = json.loads(response.body)
        products = json_resp[&#39;Listings&#39;]
        
        yield {
            &#39;DealerLocation&#39;: products[&#39;DealerLocation&#39;],
            }

答案1

得分: 1

你需要一个xsrf-token来进行这些请求，在这种情况下，你可以首先向主页（https://www.machinerytrader.com/）发出请求，然后使用选择器（//input[@name="__XSRF-TOKEN"]/@value）获取令牌。将此值添加到下一个请求的标头中，请求将有效。它将如下所示：

import scrapy
import json

class ListingSpider(scrapy.Spider):
    name = "listing"
    allowed_domains = ["www.machinerytrader.com"]
    start_urls = ["https://www.machinerytrader.com/listings/for-sale/excavators/1031"]

    def start_requests(self):
        # 你不需要那么多标头
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
        }

        yield scrapy.Request(
            url="https://www.machinerytrader.com/",
            headers=headers,
            method="GET",
            callback=self.parse,
        )

    def parse(self, response):
        xsrf_token = response.xpath('//input[@name="__XSRF-TOKEN"]/@value').get()

        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
            "x-xsrf-token": xsrf_token,
        }

        yield scrapy.Request(
            url="https://www.machinerytrader.com/ajax/listings/ajaxsearch?Category=1031&amp;sort=1&amp;page=2",
            method="GET",
            headers=headers,
            callback=self.parse_response,
        )

    def parse_response(self, response):
        json_resp = json.loads(response.body)
        products = json_resp['Listings']
        yield {
            'DealerLocation': products['DealerLocation'],
        }

如果你想运行/调度这个或多个爬虫，你可以考虑使用 estela，这是一个爬虫管理解决方案。

英文:

You need a xsrf-token to make these requests, in this case you can make a request to the home page first (https://www.machinerytrader.com/), and then grab the token using a selector (//input[@name="__XSRF-TOKEN"]/@value). Add this value to the next request headers and the request will work. It would look like this:

import scrapy
import json


class ListingSpider(scrapy.Spider):
    name = &quot;listing&quot;
    allowed_domains = [&quot;www.machinerytrader.com&quot;]
    start_urls = [&quot;https://www.machinerytrader.com/listings/for-sale/excavators/1031&quot;]

    def start_requests(self):
        # You don&#39;t need that many headers
        headers = {
            &quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36&quot;,
        }

        yield scrapy.Request(
            url=&quot;https://www.machinerytrader.com/&quot;,
            headers=headers,
            method=&quot;GET&quot;,
            callback=self.parse,
        )

    def parse(self, response):
        xsrf_token = response.xpath(&#39;//input[@name=&quot;__XSRF-TOKEN&quot;]/@value&#39;).get()

        headers = {
            &quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36&quot;,
            &quot;x-xsrf-token&quot;: xsrf_token,
        }

        yield scrapy.Request(
            url=&quot;https://www.machinerytrader.com/ajax/listings/ajaxsearch?Category=1031&amp;sort=1&amp;page=2&quot;,
            method=&quot;GET&quot;,
            headers=headers,
            callback=self.parse_response,
        )

    def parse_response(self, response):
        json_resp = json.loads(response.body)
        products = json_resp[&#39;Listings&#39;]
        yield {
            &#39;DealerLocation&#39;: products[&#39;DealerLocation&#39;],
            }

If you want to run/schedule this or many spiders you can consider using estela which is a spider management solution.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法从Scrapy API获取数据

问题

答案1

Python typing: Pylance不显示输入类型

属性赋值预期。当与Jinja结合使用时，javascript

在BeautifulSoup中根据特定字符分割文本：

I have a balanced dataset, after I split it to train & test set, the test set is imbalance, what is the reason?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论