Selenium/chrome: Why do I first need to manually run chrome before I can succeed to do the same using selenium?

huangapple go评论59阅读模式
英文:

Selenium/chrome: Why do I first need to manually run chrome before I can succeed to do the same using selenium?

问题

我正在使用selenium/chrome(crostini/Linux)来检查www.amtrak.com/tickets/departure.html上的价格。我发现手动启动的浏览器(网站应用程序正常运行)与由selenium启动的浏览器(网站应用程序失败,显示"未知错误")产生不同的结果。最初我以为这是编码问题,但最终发现我可以通过以与selenium使用的相同选项手动调用Chrome来复制失败的情况。

我还发现,如果我将selenium启动的浏览器设置为在静态用户数据目录中运行,并首先手动运行Chrome(除了--user-data-dir和--profile-directory之外没有其他选项),就可以防止失败。

我的测试用例的复制步骤:

  1. 手动启动Chrome,转到https://www.amtrak.com/tickets/departure.html
  2. 单击"New Search"
  3. 从:"PDX"
  4. 到:"JAX"
  5. 出发日期:4/2/23
  6. ...其他一切都保持默认...
  7. 单击"FIND TRAINS"

input entry form的图片(抱歉,没有足够的积分来嵌入图片)

我期望看到类似这样的图片,显示搜索结果以绿色轮廓显示。

当我尝试在(Python)selenium中自动化上述操作时,我发现如果只是使用以下方法实例化一个全新的浏览器实例,然后手动按照上述方式填写表单,我会收到"未知错误":

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://amtrak.com/tickets/departure.html")

在幕后,我可以从chrome://version看到,这种方法调用带有其自己的全新随机用户数据区域的Chrome(示例:chrome --user-data-dir=/tmp/.com.google.Chrome.VEejr7...)。我已经进行了足够的调试,发现特定的错误发生在JavaScript尝试解析代码在sessionStorage区域中期望找到的不存在的JSON时。

然而,我发现,如果我:

  1. 手动使用以下方式启动Chrome:"chrome --user-data-dir=/home/robinson/.config/google-chrome --profile-directory=selenium",进入"访客模式",然后填写表单并单击"FIND TRAINS"
  2. 使用相同的开关使用selenium启动Chrome:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
chrome_options.add_argument("user-data-dir=/home/robinson/.config/google-chrome")
chrome_options.add_argument('--profile-directory=selenium')
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://amtrak.com/tickets/departure.html")

我感到困惑的是为什么?为什么我无法成功地从以前的浏览器会话未曾触及的全新用户数据区域运行?是否有一些开关可以确保我可以从新的用户数据区域成功运行?

请注意,我已经尝试了许多选项...以下是我已经尝试过的一些选项:

chrome_options.add_argument('--incognito')
chrome_options.add_argument("--enable-javascript")
chrome_options.add_argument("--enable-file-cookies")
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])

谢谢,
大卫·罗宾逊

英文:

I'm using selenium/chrome (crostini/Linux) to check prices from www.amtrak.com/tickets/departure.html. I was seeing different results from a browser started manually (website application worked) vs a browser started by selenium (website application failed with an "unknown error"). I initially thought I had a coding issue, but eventually discovered I could replicate the failure by calling chrome manually using the same options as was used by selenium.

I also discovered I could prevent the failure if I set my selenium initiated browser to run in a static user-data-dir but first pre-conditioned this area by manually running chrome (without any options other than --user-data-dir and --profile-directory) in the same area.

Replication steps for my test case:

  1. Manually start chrome, go to https://www.amtrak.com/tickets/departure.html
  2. Click "New Search"
  3. From: "PDX"
  4. To: "JAX"
  5. Depart Date: 4/2/23
  6. ...defaults for everything else...
  7. click "FIND TRAINS"

Image of input entry form (sorry, not enough points to embed the image)

I'm expecting something like this picture where search results are shown in a green outline.

When I try to automate the above in (python) selenium, I find that I get an "unknown error" if I just instantiate a brand new browser instance using the following (and then walk through the form manually as above)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://amtrak.com/tickets/departure.html")

Under the hood, I can see from chrome://version that this method calls chrome with its own random brand new user data area (Example: chrome --user-data-dir=/tmp/.com.google.Chrome.VEejr7 ...). I've done enough debugging to find out that the specific error occurs when javascript attempts to parse a non-existent json that the code expected to find in the sessionStorage area.

However, I've found that I can make the automated solution work if I:

  1. Start chrome manually using: "chrome --user-data-dir=/home/robinson/.config/google-chrome --profile-directory=selenium", enter "guest mode", and then fill out the form and click "FIND TRAINS"
  2. Start chrome using selenium with these same switches:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
chrome_options.add_argument(**"user-data-dir=/home/robinson/.config/google-chrome"**);
chrome_options.add_argument(**'--profile-directory=selenium'**)
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://amtrak.com/tickets/departure.html")

What I'm stumped on is -- WHY? Why can't I successfully run from a virgin user data area that has never been touched by a previous browser session? Is there some switch I can call selenium with that ensures I can do this successfully from a new user data area?

Note that I've played with many options... below are a few that I've already tried:

chrome_options.add_argument('--incognito')
chrome_options.add_argument("--enable-javascript")
chrome_options.add_argument("--enable-file-cookies")
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])

Thanks,
David Robinson

答案1

得分: 1

好的,以下是已翻译的内容:

如果在使用Selenium仅打开一个(分离的)浏览器访问网站时,这些是它用于调用Chrome的选项:

/usr/bin/google-chrome --allow-pre-commit-input --disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --enable-automation --enable-blink-features=ShadowDOMV0 --enable-logging --log-level=0 --no-first-run --no-service-autorun --password-store=basic --remote-debugging-port=0 --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/.com.google.Chrome.edqG7B --flag-switches-begin --flag-switches-end data:

结果发现,如果在调用Chrome时同时使用"--enable-automation"和"--remote-debugging-port=0",则会导致amtrak.com上运行的JavaScript应用程序出现问题(具体来说,似乎会阻止该应用程序将JSON写入sessionStorage区域)。

我实施的解决方法是在我的Selenium代码中添加以下行:

options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_argument("--remote-debugging-port=9999")

第一行阻止Selenium在调用Chrome时将"--enable-automation"添加到命令行(请注意,这也会关闭"Chrome is being controlled by automated test software"的横幅)。第二行将远程调试端口设置为非0值。如果有人知道如何完全消除将此选项传递给Chrome的选项,我愿意听取建议。

这两个操作的组合设置了由Selenium启动的浏览器,以正确运行amtrak.com当前使用的应用程序,而无需在预先使用手动浏览器运行的静态区域中运行。

(这还消除了使用"--user-data-dir"和"--profile-directory"选项的必要性。)

英文:

OK, I think this is a bug in either chrome or webdriver (or selenium is incorrectly using options when it calls chrome). If I use selenium to only open a (detached) browser to a website, these are the options it uses to call chrome:

> /usr/bin/google-chrome --allow-pre-commit-input --disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --enable-automation --enable-blink-features=ShadowDOMV0 --enable-logging --log-level=0 --no-first-run --no-service-autorun --password-store=basic --remote-debugging-port=0 --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/.com.google.Chrome.edqG7B --flag-switches-begin --flag-switches-end data:,

It turns out that if both "--enable-automation" and "--remote-debugging-port=0" are used at the same time when calling chrome, then it creates problems for the javascript application that is running at amtrak.com (specifically it seems to prevent the application from writing json to the sessionStorage area).

The workaround that I've implemented is to add the following lines to my selenium code:

   options.add_experimental_option("excludeSwitches",["enable-automation"])
   options.add_argument("--remote-debugging-port=9999")

The first line prevents selenium from adding "--enable-automation" to the commandline when it calls chrome (Note that this also turns off the "Chrome is being controlled by automated test software" banner). The second line sets the remote debugging port to anything other than 0. (If anyone knows how to completely eliminate this option from being passed onto chrome, I'm all ears.)

The combination of both actions sets up the selenium-initiated browser to correctly run the application that amtrak.com currently has in place... without any need to run in a static area that has been pre-conditioned with a manual browser run.

(This also eliminates the need to use the --user-data-dir and --profile-directory options)

huangapple
  • 本文由 发表于 2023年2月24日 15:41:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75553778.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定