Selenium脚本在由cron执行时在Firefox上失败,但在手动执行时可以正常运行。

huangapple go评论85阅读模式
英文:

Selenium script with Firefox fails if executed by cron, but runs correctly if executed manually

问题

我配置了一个名为test_fire7.py的Python脚本,该脚本使用Selenium和Firefox在名为e1的Python虚拟环境中由cron执行。如果从控制台手动执行Python脚本,它会正常运行。它可以在虚拟环境激活的情况下运行,也可以在没有显式激活环境的情况下运行,但使用以下命令:/root/pye/e1/bin/python3 /root/pye/test_fire7.py

以下是crontab中的条目:

30 20 * * * cd /root/pye && /root/pye/e1/bin/python3 /root/pye/test_fire7.py command arg 2>&1 | logger -t mycmd

这是test_fire7.py的头部(我省略了脚本的其余部分,因为它本身可以正常运行):

from selenium import webdriver
from selenium.webdriver import FirefoxOptions
from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

opts = FirefoxOptions()
opts.add_argument("--headless")

driver = webdriver.Firefox(options=opts)

这是在syslog中捕获的cron运行脚本后的错误:

root@wi-master:~/pye# grep 'mycmd' /var/log/syslog
May 27 20:30:01 wi-master mycmd: Traceback (most recent call last):
May 27 20:30:01 wi-master mycmd:   File "/root/pye/test_fire7.py", line 15, in <module>
May 27 20:30:01 wi-master mycmd:     driver = webdriver.Firefox(options=opts)
May 27 20:30:01 wi-master mycmd:   File "/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/firefox/webdriver.py", line 201, in __init__
May 27 20:30:01 wi-master mycmd:     super().__init__(command_executor=executor, options=options, keep_alive=True)
May 27 20:30:01 wi-master mycmd:   File "/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__
May 27 20:30:01 wi-master mycmd:     self.start_session(capabilities, browser_profile)
May 27 20:30:01 wi-master mycmd:   File "/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session
May 27 20:30:01 wi-master mycmd:     response = self.execute(Command.NEW_SESSION, parameters)
May 27 20:30:01 wi-master mycmd:   File "/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute
May 27 20:30:01 wi-master mycmd:     self.error_handler.check_response(response)
May 27 20:30:01 wi-master mycmd:   File "/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
May 27 20:30:01 wi-master mycmd:     raise exception_class(message, screen, stacktrace)
May 27 20:30:01 wi-master mycmd: selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line

其他使用相同类型的crontab条目的脚本正常运行。因此,我认为问题可能与Selenium有关。

我已经安装了Firefox:

root@wi-master:~/pye# firefox -v
Mozilla Firefox 113.0.2

还有geckodriver:

root@wi-master:~/pye# geckodriver --version
geckodriver 0.33.0 ( 2023-05-22)

再次强调,如果手动执行脚本,它会没有错误。上面在syslog中给出的错误消息表明Firefox未安装,但事实并非如此!

更新
我尝试在调用webdriver时添加geckodriver的可执行路径,但没有成功。我不确定是否正确找到了可执行路径:

root@wi-master:~# which geckodriver
/snap/bin/geckodriver

我还显式传递了Firefox二进制文件,但也没有成功:

from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
driver = webdriver.Firefox(firefox_binary=FirefoxBinary(getoutput("find /snap/firefox -name firefox").split("\n")[-1]), options=opts)

这时,Selenium的错误消息变为:

selenium.common.exceptions.WebDriverException: Message: Service /snap/firefox/2667/usr/lib/firefox/firefox unexpectedly exited. Status code was: 255
英文:

I configured a Python script (called test_fire7.py), which uses Selenium and Firefox to be executed by cron in a Python virtual environment called e1. The Python script runs correctly, if executed manually from the console. It runs correctly both with the virtual environment activated as well without activating the environment explicitly, but using a command like this: /root/pye/e1/bin/python3 /root/pye/test_fire7.py

Here is the entry in crontab:

30 20 * * * cd /root/pye &amp;&amp; /root/pye/e1/bin/python3 /root/pye/test_fire7.py command arg 2&gt;&amp;1 | logger -t mycmd

And this is the header of test_fire7.py (I omit the rest of the script, because it runs correctly by itself):

from selenium import webdriver
from selenium.webdriver import FirefoxOptions
from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

opts = FirefoxOptions()
opts.add_argument(&quot;--headless&quot;)

driver = webdriver.Firefox(options=opts)

And here is the error, after cron runs the script, captured in syslog:

root@wi-master:~/pye# grep &#39;mycmd&#39; /var/log/syslog
May 27 20:30:01 wi-master mycmd: Traceback (most recent call last):
May 27 20:30:01 wi-master mycmd:   File &quot;/root/pye/test_fire7.py&quot;, line 15, in &lt;module&gt;
May 27 20:30:01 wi-master mycmd:     driver = webdriver.Firefox(options=opts)
May 27 20:30:01 wi-master mycmd:   File &quot;/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/firefox/webdriver.py&quot;, line 201, in __init__
May 27 20:30:01 wi-master mycmd:     super().__init__(command_executor=executor, options=options, keep_alive=True)
May 27 20:30:01 wi-master mycmd:   File &quot;/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py&quot;, line 286, in __init__
May 27 20:30:01 wi-master mycmd:     self.start_session(capabilities, browser_profile)
May 27 20:30:01 wi-master mycmd:   File &quot;/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py&quot;, line 378, in start_session
May 27 20:30:01 wi-master mycmd:     response = self.execute(Command.NEW_SESSION, parameters)
May 27 20:30:01 wi-master mycmd:   File &quot;/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py&quot;, line 440, in execute
May 27 20:30:01 wi-master mycmd:     self.error_handler.check_response(response)
May 27 20:30:01 wi-master mycmd:   File &quot;/root/pye/e1/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py&quot;, line 245, in check_response
May 27 20:30:01 wi-master mycmd:     raise exception_class(message, screen, stacktrace)
May 27 20:30:01 wi-master mycmd: selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no &#39;moz:firefoxOptions.binary&#39; capability provided, and no binary flag set on the command line

Other scripts run correctly using exactly the same kind of entries in crontab. So I assume the problem is with Selenium in some way.

I have Firefox installed:

root@wi-master:~/pye# firefox -v
Mozilla Firefox 113.0.2

And also geckodriver:

root@wi-master:~/pye# geckodriver --version
geckodriver 0.33.0 ( 2023-05-22)

Again, the script passes without errors, if executed manually. The error message in syslog given above suggests, that Firefox is not installed. But this is not the case!

Update:
I tried adding the executable path of geckodriver when calling the webdriver, but with no success. I'm not sure if I locate the executable path correctly

root@wi-master:~# which geckodriver
/snap/bin/geckodriver

I also passed the Firefox binary explicitely like this, but also without success.

from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
driver=webdriver.Firefox(firefox_binary=FirefoxBinary(getoutput(&quot;find /snap/firefox -name firefox&quot;).split(&quot;\n&quot;)[-1]), options=opts)

With this the Selenium error changes to

selenium.common.exceptions.WebDriverException: Message: Service /snap/firefox/2667/usr/lib/firefox/firefox unexpectedly exited. Status code was: 255

答案1

得分: 1

看起来你的脚本的工作目录与手动启动时不同。一个解决办法可能是在脚本中设置工作目录,相对于脚本自身。

假设你的脚本位于某个文件夹中,当你手动启动它时,是从该文件夹启动的。如果你将以下内容添加到脚本中:

import os
import pathlib

os.chdir(pathlib.Path(__file__).parent)

这将把工作目录更改为脚本所在的文件夹。

当然,如果你手动从其他文件夹启动脚本(比如你的主目录、配置文件目录等),你也可以尝试更改为那个文件夹。

但是,请记住,cron 进程可能无法访问你在交互式会话中可以访问的文件夹。如果在执行之前设置了正确的工作目录,请确保该进程被允许访问(并执行)那里的二进制文件。

英文:

It looks like the working directory of your script is different from when you start it manually. One solution could be to set the working directory from the script, relative to the script itself.

Say you have the script in some folder, and when you start it manually, you start it from that folder. If you add this to the script:

import os
import pathlib

os.chdir(pathlib.Path(__file__).parent)

This changes the working directory to the folder the script is in.

Of course, if you manually started the script from some other folder (like your home directory, profile directory, etc.) you could try and change to that instead.

However, keep in mind that the cron process may not be allowed to access folders that you do have access to in your interactive session. If you set the correct working directory before execution, make sure the process is allowed to access (and execute) the binary there.

答案2

得分: 0

我终于找到了一个解决方案。经过许多尝试和错误,终于在我在互联网上找到的所有建议都不适用之后发生了这种情况。

为了完整起见,这是我的系统:

  • Ubuntu 20.04.6 LTS
  • Python 3.8.10
  • selenium 4.9.1
  • Mozilla Firefox 113.0.2
  • geckodriver 0.33.0

我使用虚拟环境并从根目录运行脚本。

首先,我观察了Firefox和geckodriver的位置:

root@wikijs-master:~# which firefox
/snap/bin/firefox
root@wikijs-master:~# which geckodriver
/snap/bin/geckodriver

基于此,我将export PATH=$PATH:/snap/bin/;添加到上面的crontab条目中:

* * * * *  export PATH=$PATH:/snap/bin/; cd /root/pye &amp;&amp; /root/pye/e1/bin/python3 /root/pye/test_fire7.py command arg 2&gt;&amp;1 | logger -t mycmd

以下是cron可以执行的示例python脚本!

from selenium import webdriver
from selenium.webdriver import FirefoxOptions

opts = FirefoxOptions()
opts.add_argument("--headless")

driver = webdriver.Firefox(options=opts)
driver.get("https://google.com")
print(driver.current_url)

我想指出,像添加可执行路径、绝对路径或webdriver属性中的二进制位置、更改工作目录或修改cron用户的权限等其他解决方案对我都不起作用。

英文:

I finally got a solution. It happend through very much trial and error after all suggestions I found on the internet didn't work for me.

For completeness, here is my system:

  • Ubuntu 20.04.6 LTS
  • Python 3.8.10
  • selenium 4.9.1
  • Mozilla Firefox 113.0.2
  • geckodriver 0.33.0

I use virtual environments and run the scripts from root.

First, I observed where Firefox and geckodriver reside:

root@wikijs-master:~# which firefox
/snap/bin/firefox
root@wikijs-master:~# which geckodriver
/snap/bin/geckodriver

Based on that I added export PATH=$PATH:/snap/bin/; to the crontab entry from above:

* * * * *  export PATH=$PATH:/snap/bin/; cd /root/pye &amp;&amp; /root/pye/e1/bin/python3 /root/pye/test_fire7.py command arg 2&gt;&amp;1 | logger -t mycmd

And here is a sample python script that cron can execute!

from selenium import webdriver
from selenium.webdriver import FirefoxOptions

opts = FirefoxOptions()
opts.add_argument(&quot;--headless&quot;)

driver=webdriver.Firefox(options=opts)
driver.get(&quot;https://google.com&quot;)
print(driver.current_url)

I want to note that other solutions like adding executable paths, absolute paths or binary locations within the webdriver attributes, changing the working directory or modifying the permissions for the cron user didn't work for me.

huangapple
  • 本文由 发表于 2023年5月28日 05:16:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76349079.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定