使用Selenium和Docker设置代理。

huangapple go评论107阅读模式
英文:

Setting the proxy using Selenium and Docker

问题

我在使用代理进行网络爬虫时遇到了问题。我使用了容器化的Python代码和selenium/standalone-chrome镜像。我尝试了类似以下的代码来传递参数,但Chrome实例似乎忽略了它。我有一个示例爬虫,从ident.me网页上抓取IP地址,但它返回了我的机器IP。

  1. def get_chrome_driver(proxy):
  2. proxy = str(proxy)
  3. chrome_options = webdriver.ChromeOptions()
  4. chrome_options.add_argument('--proxy=%s' % proxy)
  5. chrome_options.add_argument("--no-sandbox")
  6. chrome_options.add_argument("--headless")
  7. chrome_options.add_argument("--disable-gpu")
  8. driver = webdriver.Remote(
  9. command_executor='http://chrome:4444/wd/hub',
  10. options=chrome_options
  11. )
  12. return driver
英文:

I have a trouble during using proxy for scraping. I use dockerized Python code and

  1. selenium/standalone-chrome

image.
I tried something like this

  1. def get_chrome_driver(proxy):
  2. proxy = str(proxy)
  3. chrome_options = webdriver.ChromeOptions()
  4. chrome_options.add_argument('--proxy=%s' % proxy)
  5. chrome_options.add_argument("--no-sandbox")
  6. chrome_options.add_argument("--headless")
  7. chrome_options.add_argument("--disable-gpu")
  8. driver = webdriver.Remote(
  9. command_executor='http://chrome:4444/wd/hub',
  10. options=webdriver.ChromeOptions()
  11. )
  12. return driver

to pass the parameters but the Chrome instance seems to ignore it. I have example scraper scraping IP address from ident.me webpage and it returns my machine's IP.

答案1

得分: 1

你正在使用这行代码为驱动程序实例保存默认选项

  1. options=webdriver.ChromeOptions()

你需要设置你创建的选项

  1. options=chrome_options
英文:

you are saving default options with this line for the driver instance

  1. options=webdriver.ChromeOptions()

you need to set your created options

  1. options=chrome_options

huangapple
  • 本文由 发表于 2023年1月9日 08:53:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052342.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定