如何在使用Scrapy中的get_project_settings()时指定代理列表的路径?

huangapple go评论57阅读模式
英文:

How do I indicate path to proxylist when using get_project_settings() in Scrapy

问题

Scrapy在通过get_project_settings()调用设置时为什么找不到我的代理文件?

英文:

I am trying to run my spider from my script.
It runs fine from command prompt and it runs fine from the script if I don't use my proxies (except I get 403's because I'm not using proxies).

I have tried changing my filepath, but none worked.

In settings.py I simply use

ROTATING_PROXY_LIST_PATH = 'proxylist'

This is my scapy.cfg, I tried changing 'scraper' to scraper.scraper for the heck of it, but didn't work.

[settings]
default = scraper.settings

[deploy]
#url = http://localhost:6800/
project = scraper

This is my project structure

  • rascraper
    • scraper
      • spiders
        • init.py
        • Spider.py
      • init.py
      • items.py
      • middewares.py
      • pipelines.py
      • settings.py
      • scraper
      • scrapy.cfg
      • proxylist

I don't think including the spider is relevant, but this is how I call it (in the same file)

if __name__ == '__main__':

    process = CrawlerProcess(get_project_settings())
    process.crawl('Acts', artist="eddiem")
    process.start()

Why does scrapy not find my proxyfile when calling the settings via get_project_settings()?

答案1

得分: 2

Your scrapy.cfg needs to be moved to its parent directory. According to the scrapy docs:

"虽然它可以被修改,但默认情况下,所有 Scrapy 项目具有相同的文件结构,类似于这样:

scrapy.cfg
myproject/
    __init__.py
    items.py
    middlewares.py
    pipelines.py
    settings.py
    spiders/
        __init__.py
        spider1.py
        spider2.py
        ...

scrapy.cfg 文件所在的目录被称为项目的根目录。该文件包含定义项目设置的 Python 模块的名称。以下是一个示例:

[settings]
default = myproject.settings

这意味着 scrapy.cfg 文件应至少位于项目目录/具有 settings.py 文件的目录的上一级目录。"

英文:

Your scrapy.cfg needs to be moved to it's parent directory.

According to the scrapy docs.

> Though it can be modified, all Scrapy projects have the same file structure by default, similar to this:
>
> scrapy.cfg
> myproject/
> __init__.py
> items.py
> middlewares.py
> pipelines.py
> settings.py
> spiders/
> __init__.py
> spider1.py
> spider2.py
> ...
>

> The directory where the scrapy.cfg file resides is known as the project root directory. That file contains the name of the python module that defines the project settings. Here is an example:
>
>[settings]
>default = myproject.settings
>

Which means the scrapy.cfg file should be at least one directory above the the project directory/directory with the settings.py file.

huangapple
  • 本文由 发表于 2023年5月7日 02:59:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76190610.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定