问题

以下是翻译好的内容，不包括代码部分：

"I am trying new feature for myself as adding proxy port to my python scraper code."
我正在尝试为我的Python爬虫代码添加代理端口作为新功能。

"I took free proxy from this site, and looked for an answer from SO. With the help of user @dskrypa, I changed in my code meta={'proxy':'103.42.162.50:8080'}" 我从这个[网站][1]获取了免费代理，然后在[SO][2]上寻找答案。在用户@dskrypa的帮助下，我在我的代码中进行了更改，meta={'proxy':'103.42.162.50:8080'}`。

"Now it gives an error which continues all along if I do not stop the code run."
现在它出现了一个错误，如果我不停止代码运行，这个错误将一直持续。

"File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 279, in _get_agent
proxyScheme, proxyNetloc, proxyHost, proxyPort, proxyParams = _parse(proxy)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\core\downloader\webclient.py", line 39, in _parse
return _parsed_url_args(parsed)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\core\downloader\webclient.py", line 20, in _parsed_url_args
host = to_bytes(parsed.hostname, encoding="ascii")
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\utils\python.py", line 108, in to_bytes
raise TypeError('to_bytes must receive a str or bytes '
TypeError: to_bytes must receive a str or bytes object, got NoneType"
出现了错误，如果我不停止代码运行，这个错误将一直持续。错误信息如上所示。

"2023-03-12 02:47:32 [scrapy.core.scraper] ERROR: Error downloading <GET https://dvlaregistrations.dvla.gov.uk/search/results.html?search=N11CKY&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>"
2023-03-12 02:47:32 [scrapy.core.scraper] 错误：下载错误 <GET https://dvlaregistrations.dvla.gov.uk/search/results.html?search=N11CKY&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>

"Here is my code;"
这是我的代码：

"import winsound
winsound.Beep(555,333)"
导入 winsound
winsound.Beep(555,333)

英文:

I am trying new feature for myself as adding proxy port to my python scraper code.

I took free proxy from this site, and looked for an answer from SO. With help of user @dskrypa I changed in my code meta={'proxy':'103.42.162.50:8080'}

Now it gives an error which continues all along if I do not stop the code run.

File &quot;C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\core\downloader\handlers\http11.py&quot;, line 279, in _get_agent
    proxyScheme, proxyNetloc, proxyHost, proxyPort, proxyParams = _parse(proxy)
  File &quot;C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\core\downloader\webclient.py&quot;, line 39, in _parse
    return _parsed_url_args(parsed)
  File &quot;C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\core\downloader\webclient.py&quot;, line 20, in _parsed_url_args
    host = to_bytes(parsed.hostname, encoding=&quot;ascii&quot;)
  File &quot;C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\utils\python.py&quot;, line 108, in to_bytes
    raise TypeError(&#39;to_bytes must receive a str or bytes &#39;
TypeError: to_bytes must receive a str or bytes object, got NoneType
2023-03-12 02:47:32 [scrapy.core.scraper] ERROR: Error downloading &lt;GET https://dvlaregistrations.dvla.gov.uk/search/results.html?search=N11CKY&amp;action=index&amp;pricefrom=0&amp;priceto=&amp;prefixmatches=&amp;currentmatches=&amp;limitprefix=&amp;limitcurrent=&amp;limitauction=&amp;searched=true&amp;openoption=&amp;language=en&amp;prefix2=Search&amp;super=&amp;super_pricefrom=&amp;super_priceto=&gt;

Here is my code;

import scrapy
from scrapy.crawler import CrawlerProcess
import pandas as pd
import scrapy_xlsx

itemList=[]
class plateScraper(scrapy.Spider):
    name = &#39;scrapePlate&#39;
    allowed_domains = [&#39;dvlaregistrations.dvla.gov.uk&#39;]
    FEED_EXPORTERS = {&#39;xlsx&#39;: &#39;scrapy_xlsx.XlsxItemExporter&#39;}
    custom_settings = {&#39;FEED_EXPORTERS&#39; :FEED_EXPORTERS,&#39;FEED_FORMAT&#39;: &#39;xlsx&#39;,&#39;FEED_URI&#39;: &#39;output_r00.xlsx&#39;, &#39;LOG_LEVEL&#39;:&#39;INFO&#39;,&#39;DOWNLOAD_DELAY&#39;: 0}
    DOWNLOADER_MIDDLEWARES = {
        &#39;scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware&#39;: 1
    }

    def start_requests(self):
        df=pd.read_excel(&#39;data.xlsx&#39;)
        columnA_values=df[&#39;PLATE&#39;]
        for row in columnA_values:
            global  plate_num_xlsx
            plate_num_xlsx=row
            base_url =f&quot;https://dvlaregistrations.dvla.gov.uk/search/results.html?search={plate_num_xlsx}&amp;action=index&amp;pricefrom=0&amp;priceto=&amp;prefixmatches=&amp;currentmatches=&amp;limitprefix=&amp;limitcurrent=&amp;limitauction=&amp;searched=true&amp;openoption=&amp;language=en&amp;prefix2=Search&amp;super=&amp;super_pricefrom=&amp;super_priceto=&quot;
            url=base_url
            yield scrapy.Request(url,callback=self.parse, cb_kwargs={&#39;plate_num_xlsx&#39;: plate_num_xlsx},meta={&#39;proxy&#39;:&#39;103.42.162.50:8080&#39;})

    def parse(self, response, plate_num_xlsx=None):
        plate = response.xpath(&#39;//div[@class=&quot;resultsstrip&quot;]/a/text()&#39;).extract_first()
        price = response.xpath(&#39;//div[@class=&quot;resultsstrip&quot;]/p/text()&#39;).extract_first()

        try:
            a = plate.replace(&quot; &quot;, &quot;&quot;).strip()
            if plate_num_xlsx == plate.replace(&quot; &quot;, &quot;&quot;).strip():
                item = {&quot;plate&quot;: plate_num_xlsx, &quot;price&quot;: price.strip()}
                itemList.append(item)
                print(item)
                yield item
            else:
                item = {&quot;plate&quot;: plate_num_xlsx, &quot;price&quot;: &quot;-&quot;}
                itemList.append(item)
                print(item)
                yield item
        except:
            item = {&quot;plate&quot;: plate_num_xlsx, &quot;price&quot;: &quot;-&quot;}
            itemList.append(item)
            print(item)
            yield item

process = CrawlerProcess()
process.crawl(plateScraper)
process.start()

import winsound
winsound.Beep(555,333)

答案1

得分: 2

你应该在代理URL中包括协议：

meta={"proxy": "http://103.42.162.50:8080"}

英文:

you should include the protocol in the proxy url:

meta={&quot;proxy&quot;: &quot;http://103.42.162.50:8080&quot;}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Scrapy框架添加到我的代码代理。

问题

答案1

围绕中心旋转玩家可以正常运作，但与墙壁的碰撞不再起作用。

如何在cgo中使用包装器释放unsafe.Pointer？

Pyparsing：如何从各个组收集所有命名结果？

一个以字母开头的ascii_letters + digits的随机字符串

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论