服务 /usr/bin/chromedriver 意外退出,状态码为:1

huangapple go评论159阅读模式
英文:

Service /usr/bin/chromedriver unexpectedly exited Status code was: 1

问题

抱歉,我无法提供完整的代码翻译,因为它包含了很多代码片段和细节。但我可以帮助你理解问题并提供一些可能的解决方案。

您的问题似乎与在服务器上使用Selenium和ChromeDriver时出现的错误有关。根据您提供的信息,一些可能的解决方案包括:

  1. ChromeDriver 版本问题:确保您的ChromeDriver版本与您的Google Chrome浏览器版本兼容。您的ChromeDriver版本是114.0.5735.198,而Google Chrome版本也是114.0.5735.198,这是一致的。这一点很好。

  2. Chromedriver启动问题:根据您提供的信息,Chromedriver在本地启动没有问题,但在服务器上出现问题。确保服务器上的ChromeDriver二进制文件(/usr/bin/chromedriver)是可执行的,并且有足够的权限运行。您还可以尝试将ChromeDriver二进制文件所在的目录添加到系统的PATH环境变量中。

  3. Selenium选项配置:您已经配置了一些Selenium选项,例如--headless等。确保这些选项在服务器上正确工作。有时,特别是在无头模式下,可能需要配置其他选项以使其在服务器上正常运行。

  4. 服务器权限问题:确保您的服务器上没有任何权限问题,例如防火墙阻止了Chromedriver或Chrome的正常运行。

  5. 查看日志:查看服务器上的Chromedriver和Chrome的日志文件,以查看是否有其他详细错误消息,这有助于更好地诊断问题。

  6. 检查Nginx配置:确保Nginx配置正确,没有导致请求失败或阻止Selenium与Chromedriver通信的问题。

请注意,在服务器上运行Selenium和Chromedriver可能会涉及到一些特定于服务器环境的问题,这需要仔细调试和排除故障。如果您遇到困难,可能需要考虑在服务器上使用Xvfb(虚拟桌面)等工具来支持无头浏览器模式。

希望这些提示有助于您解决问题或指导您进行更详细的故障排除。

英文:

I have been trying to host a django project with selenium inside digital droplet. I installed all the necessary things but I am getting this error:

Service /usr/bin/chromedriver unexpectedly exited. Status code was: 1\n

If I write this command: chromedriver I get this:

Starting ChromeDriver 114.0.5735.198 (c3029382d11c5f499e4fc317353a43d411a5ce1c-refs/branch-heads/5735@{#1394}) on port 9515
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.

This is my chromedriver version:

ChromeDriver 114.0.5735.198 (c3029382d11c5f499e4fc317353a43d411a5ce1c-refs/branch-heads/5735@{#1394})

This is my google-chrome version:

Google Chrome 114.0.5735.198 

I have deployed it using nginx gunicorn. The server is running well eveything running well but I am getting error while I send request which uses selenium chromedriver.

Here is a code snippet for this automation.py:

class Scrape:
    def find_email_and_phone(self, url):
        payloads = {
            "company_name": self.remove_and_fetch_name(url),
            "email": "",
            "links": [],
            "numbers": []
        }
        links = []
        driver_location = "/usr/bin/google-chrome"
        # driver_service = Service("/chromedriver_linux64/chromedriver")
        chrome_options_ = Options()
        chrome_options_.add_argument('--verbose')
        chrome_options_.add_argument('--headless')
        chrome_options_.binary_location = '/usr/bin/google-chrome'
        chrome_options_.add_argument('--no-sandbox')
        chrome_options_.add_argument('--disable-dev-shm-usage')
        chrome_options_.add_argument('')
        driver_ = webdriver.Chrome(options=chrome_options_, service=Service(executable_path=driver_location))
        try:
            driver_.get(url)
            page_content = driver_.page_source
            email_pattern = re.search(r"[\w.+-]+@[\w-]+\.[\w.-]+", page_content)
            # links_pattern = re.search(r"")
            if email_pattern:
                payloads["email"] = email_pattern.group()
                links.append(email_pattern.group())
                # print(links)
            else:
                print("No Email Found!")

            # finding all social links (searching for linkedin / facebook)
            links_pattern = re.findall(r'href=[\'"]?([^\'" >]+)', page_content)
            https_links = 
filtered_links = [] keywords = ["linkedin"] for link in https_links: if any(keyword in link for keyword in keywords): filtered_links.append(link) payloads["links"] =
# finding phone numbers that are present inside the website phone_numbers = re.findall( r'\b(?:\+?\d{1,3}\s*(?:\(\d{1,}\))?)?[.\-\s]?\(?(\d{3})\)?[.\-\s]?(\d{3})[.\-\s]?(\d{4})\b', page_content) formatted_phone_numbers = [ f"({number[0]}) {number[1]}-{number[2]}" for number in set(phone_numbers)] payloads["numbers"] = [number for number in formatted_phone_numbers] # df = pd.DataFrame([payloads]) # df['numbers'] = df['numbers'].apply(lambda x: ', '.join(x)) # df.to_csv(f"{datetime.now()}.csv", index=False) return payloads except Exception as e: return str(e) finally: driver_.quit()

Here is my views.py:

def post(self, request):
try:
email_and_phone = []
scrap = Scrape()
query = request.data.get("query")
data = scrap.extract_important_links(query, int(request.data.get("number_of_results")))
for d in data:
sc = scrap.find_email_and_phone(d)
email_and_phone.append(sc)

    for item in email_and_phone:
        dataset = DataSet.objects.create(
            company_name=item["company_name"],
            email=item["email"]
        )
        for n in item["numbers"]:
            numbers = Numbers.objects.create(
                number=n
            )
            dataset.numbers.add(numbers.id)
        for li in item["links"]:
            links = Links.objects.create(
                link=li
            )
            dataset.links.add(links.id)

    return response({
        "success": True,
        "data": email_and_phone
    }, status=status.HTTP_200_OK)
except Exception as e:
    return response({
        "success": False,
        "message": str(e)
    }, status=status.HTTP_500_INTERNAL_SERVER_ERROR)

I saw a lot of solution from stackoverflow. But couldn't find any solution for me. It runs well when I run the script like this:

python3 automation.py, it doesn't throw any exception also it runs well when I run runserver using this command:

python3 manage.py runser my_ip:8000

But it doesn't work properly when I request it from the server without running runserver command.

答案1

得分: 1

使用[tag:selenium4]和[Service](https://stackoverflow.com/a/70099102/7429447)参数您不再需要传递[executable_path](https://stackoverflow.com/a/57553986/7429447)键

因此您的有效代码行将是

    driver_location = "/chromedriver_linux64/chromedriver"
    driver_ = webdriver.Chrome(options=chrome_options_, service=Service(driver_location))

然而使用[Selenium](https://stackoverflow.com/a/54482491/7429447)**v4.6**及以上版本的[Selenium Manager](https://stackoverflow.com/a/76563271/7429447)会处理chromedriver二进制文件因此您的有效代码块将是

    chrome_options_ = Options()
    chrome_options_.add_argument('--verbose')
    chrome_options_.add_argument('--headless')
    chrome_options_.binary_location = '/usr/bin/google-chrome'
    chrome_options_.add_argument('--no-sandbox')
    chrome_options_.add_argument('--disable-dev-shm-usage')
    chrome_options_.add_argument('')
    driver_ = webdriver.Chrome(options=chrome_options_)
英文:

Using the [tag:selenium4] and the Service argument you no more need to pass the executable_path key.

So your effective line of code will be:

driver_location = "/chromedriver_linux64/chromedriver"
driver_ = webdriver.Chrome(options=chrome_options_, service=Service(driver_location))

However using Selenium v4.6 and above Selenium Manager would take care of the chromedriver binary. So your effective code block will be:

chrome_options_ = Options()
chrome_options_.add_argument('--verbose')
chrome_options_.add_argument('--headless')
chrome_options_.binary_location = '/usr/bin/google-chrome'
chrome_options_.add_argument('--no-sandbox')
chrome_options_.add_argument('--disable-dev-shm-usage')
chrome_options_.add_argument('')
driver_ = webdriver.Chrome(options=chrome_options_)

huangapple
  • 本文由 发表于 2023年7月7日 00:45:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76630965.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定