在Python中使用无头浏览器打开txt文件中的链接。

huangapple go评论67阅读模式
英文:

Opening links in txt file in headless browser in python

问题

以下是您要翻译的内容:

import os
import subprocess
import time
import argparse

def read_links_from_file(file_path):
    links = []
    with open(file_path, 'r') as file:
        for line in file:
            links.append(line.strip())
    return links

def open_links_in_chrome(links, headless=True):
    options = '--headless' if headless else ''
    for link in links:
        subprocess.call(f'google-chrome {options} --app={link}', shell=True)
        time.sleep(1)

def run_snort(interface):
    subprocess.call(f'snort -i {interface}', shell=True)

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--file', help='Path to file containing links', required=True)
    parser.add_argument('--interface', help='Network interface for Snort to listen on', required=True)
    parser add_argument('--headless', help='Run Chrome in headless mode', action='store_true')
    args = parser.parse_args()
    file_path = args.file
    interface = args.interface
    headless = args.headless
    
    links = read_links_from_file(file_path)
    snort_process = subprocess.Popen(['snort', '-i', interface])
    open_links_in_chrome(links, headless)
    snort_process.terminate()

if __name__ == '__main__':
    main()

这部分代码是您提供的Python代码,没有翻译内容。

英文:

I have been having a problem running the code below and suspect the problem is with link.strip(). The program is running in a linux environment and it is supposed to open multiple links contained in a text files and opens them for snort to scan for malware. The file name is defined in the terminal before the code is executed.

import os
import subprocess
import time
import argparse

def read_links_from_file(file_path):
    links = []
    with open(file_path, 'r') as file:
        for line in file:
            links.append(line.strip())
    return links

def open_links_in_chrome(links, headless=True):
    options = '--headless' if headless else ''
    for link in links:
        subprocess.call('google-chrome {options} --app={link}', shell=True)
        time.sleep(1)

def run_snort(interface):
    subprocess.call(f'snort -i {interface}', shell=True)

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--file', help='Path to file containing links', required=True)
    parser.add_argument('--interface', help='Network interface for Snort to listen on', required=True)
    parser.add_argument('--headless', help='Run Chrome in headless mode', action='store_true')
    args = parser.parse_args()
    file_path = args.file
    interface = args.interface
    headless = args.headless
    
    links = read_links_from_file(file_path)
    snort_process = subprocess.Popen(['snort', '-i', interface])
    open_links_in_chrome(links, headless)
    snort_process.terminate()

if __name__ == '__main__':
    main()

I tried reconfiguring the applications and rewrote the code but I'm not sure if I preserved the right code but

links.append(line.strip()) 

doesn't seem to be the right way to go. I have also changed the sleep time from 5 to 1

After some tinkering I ended up with the following error

Acquiring network traffic from "eth0". ERROR: Can't start DAQ (-1) - socket: Operation not permitted! Fatal Error, Quitting.. libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name
= (null) [121024:121024:0217/122814.243731:ERROR:gpu_memory_buffer_support_x11.cc(49)] dri3 extension not supported. [121070:8:0217/122815.025776:ERROR:command_buffer_proxy_impl.cc(128)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer. Fontconfig error: Cannot load default config file: No such file: (null)

答案1

得分: 0

> I have been having a problem running the code below and suspect the problem is with link.strip().

我一直在运行以下代码时遇到问题,怀疑问题出在 link.strip() 上。

I assume you mean line.strip() (you're not calling link.strip() anywhere in your code). If you think the code is problematic, let's test it. If I have a file that contains a list of four URLs in file urls.txt:

我猜你是指 line.strip()(在你的代码中没有调用 link.strip())。如果你认为代码有问题,让我们来测试一下。如果我有一个包含四个URL的文件 urls.txt

And then run the following code:

然后运行以下代码:

import sys

def read_links_from_file(file_path):
    links = []
    with open(file_path, 'r') as file:
        for line in file:
            links.append(line.strip())
    return links

links = read_links_from_file('urls.txt')
for i, link in enumerate(links):
    print(f'{i}: {link}')

I get the following output:

我会得到以下输出:

0: https://google.com
1: https://stackoverflow.com
2: https://www.npr.org/programs/wait-wait-dont-tell-me/
3: https://www.nyan.cat/

That suggests your read_links_from_file function works as expected.

这表明你的 read_links_from_file 函数按预期工作。

On the other hand, you're doing more work than is necessary. The default behavior of a Python file object is to act as an iterator over the lines in the file, so instead of writing this:

另一方面,你做的工作比必要的多。Python文件对象的默认行为是作为文件中行的迭代器,所以不需要编写以下代码:

def read_links_from_file(file_path):
    links = []
    with open(file_path, 'r') as file:
        for line in file:
            links.append(line.strip())
    return links

links = read_links_from_file(args.file)
open_links_in_chrome(links, args.headless)

You can just delete the read_links_from_file function and pass the open file:

你可以直接删除 read_links_from_file 函数并传递打开的文件:

with open(args.file) as links:
  open_links_in_chome((line.strip() for line in links), args.headless)

I'm cheating a bit here because instead of simply iterating over the file, I'm using a generator expression to take care of stripping the end-of-line character.

我在这里有点作弊,因为我不仅仅是迭代文件,还使用了一个生成器表达式来处理去除行末尾的字符。

You have an error in your open_links_in_chrome function. You have written:

你的 open_links_in_chrome 函数中有一个错误。你写成了:

subprocess.call('google-chrome {options} --app={link}', shell=True)

This will result in running the literal command line...

这将导致运行字面的命令行...

chrome {options} --app={link}

...because you are neither using a Python f-string nor are you calling the .format() method. You need to write the function like this in order to run Chrome as expected:

...因为你既没有使用Python的f-string,也没有调用.format()方法。为了正常运行Chrome,你需要这样写这个函数:

def open_links_in_chrome(links, headless=True):
    options = '--headless' if headless else ''
    for link in links:
        subprocess.call(f'google-chrome {options} --app={link}', shell=True)
        time.sleep(1)

This introduces a new problem: this will successfully open Chrome with the first URL, but Chrome will never exit, so your code won't continue past this point.

这引入了一个新问题:这将成功打开Chrome并加载第一个URL,但Chrome永远不会退出,所以你的代码不会继续执行。

Rather than trying to fix this, I would suggest using a browser automation library like Playwright or Selenium. Here's your code rewritten to use Playwright:

与其试图修复这个问题,我建议使用像PlaywrightSelenium这样的浏览器自动化库。以下是使用Playwright重写的代码:

import playwright
from playwright.sync_api import sync_playwright

import subprocess
import time
import argparse
import signal

def open_links_in_chrome(links, headless=True):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=headless)
        page = browser.new_page()
        for link in links:
            print(f'fetching {link}')
            try:
                page.goto(link)
            except playwright._impl._api_types.TimeoutError:
                print(f'{link} timed out.')
            time.sleep(1)
        browser.close()

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--file', help='Path to file containing links', required=True)
    parser.add_argument('--interface', help='Network interface for Snort to listen on', required=True)
    parser.add_argument('--headless', help='Run Chrome in headless mode', action='store_true')
    args = parser.parse_args()

    snort_process = subprocess.Popen(['snort', '-i', args.interface])
    with open(args.file) as links:
        open_links_in_chrome((line.strip() for line in links), headless=args.headless)
    snort_process.terminate()

if __name__ == '__main__':
    main()

If we run this -- assuming we have followed the Playwright installation instructions -- we see as output:

如果我们运行这个代码 - 假设我们已经按照Playwright的安装说明进行了安装 - 我们会看到以下输出:

fetching https://google.com
fetching https://stackoverflow.com
fetching https://www.npr.org/programs/wait-wait-dont-tell-me/
fetching https://www.nyan.cat/

In my tests I've replaced snort with tcpdump, and examining the resulting packet capture I can see that we're making the expected network requests:

在我的测试中,我用

英文:

> I have been having a problem running the code below and suspect the problem is with link.strip().

I assume you mean line.strip() (you're not calling link.strip() anywhere in your code). If you think the code is problematic, let's test it. If I have a file that contains a list of four URLs in file urls.txt:

https://google.com
https://stackoverflow.com
https://www.npr.org/programs/wait-wait-dont-tell-me/
https://www.nyan.cat/

And then run the following code:

import sys

def read_links_from_file(file_path):
    links = []
    with open(file_path, 'r') as file:
        for line in file:
            links.append(line.strip())
    return links

links = read_links_from_file('urls.txt')
for i, link in enumerate(links):
    print(f'{i}: {link}')

I get the following output:

0: https://google.com
1: https://stackoverflow.com
2: https://www.npr.org/programs/wait-wait-dont-tell-me/
3: https://www.nyan.cat/

That suggest your read_links_from_file function works as expected.

On the other hand, you're doing more work than is necessary. The default behavior of a Python file object is to act as an iterator over the lines in the file, so instead of writing this:

def read_links_from_file(file_path):
    links = []
    with open(file_path, 'r') as file:
        for line in file:
            links.append(line.strip())
    return links

links = read_links_from_file(args.file)
open_links_in_chrome(links, args.headless)

You can just delete the read_links_from_file functions and pass the open file:

with open(args.file) as links:
  open_links_in_chome((line.strip() for line in links), args.headless)

I'm cheating a bit here because in stead of simply iterating over the file I'm using a generator expression to take care of stripping the end-of-line character.


You have an error in your open_links_in_chrome function. You have written:

subprocess.call('google-chrome {options} --app={link}', shell=True)

This will result in running the literal command line...

chrome {options} --app={link}

...because you are neither using a Python f-string nor are you calling the .format() method. You need to write the function like this in order to run Chrome as expected:

def open_links_in_chrome(links, headless=True):
    options = '--headless' if headless else ''
    for link in links:
        subprocess.call(f'google-chrome {options} --app={link}', shell=True)
        time.sleep(1)

This introduces a new problem: this will successfully open Chrome with the first URL, but Chrome will never exit, so your code won't continue past this point.

Rather than trying to fix this, I would suggest using a browser automation library like Playwright or Selenium. Here's your code rewritten to use Playwright:

import playwright
from playwright.sync_api import sync_playwright

import subprocess
import time
import argparse
import signal

def open_links_in_chrome(links, headless=True):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=headless)
        page = browser.new_page()
        for link in links:
            print(f'fetching {link}')
            try:
                page.goto(link)
            except playwright._impl._api_types.TimeoutError:
                print(f'{link} timed out.')
            time.sleep(1)
        browser.close()

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--file', help='Path to file containing links', required=True)
    parser.add_argument('--interface', help='Network interface for Snort to listen on', required=True)
    parser.add_argument('--headless', help='Run Chrome in headless mode', action='store_true')
    args = parser.parse_args()

    snort_process = subprocess.Popen(['snort', '-i', args.interface])
    with open(args.file) as links:
        open_links_in_chrome((line.strip() for line in links), headless=args.headless)
    snort_process.terminate()

if __name__ == '__main__':
    main()

If we run this -- assuming we have followed the Playwright installation instructions -- we see as output:

fetching https://google.com
fetching https://stackoverflow.com
fetching https://www.npr.org/programs/wait-wait-dont-tell-me/
fetching https://www.nyan.cat/

In my tests I've replaced snort with tcpdump, and examining the resulting packet capture I can see that we're making the expected network requests:

$ tcpdump -r packets port 53 | grep -E 'A\? (google.com|stackoverflow.com|www.npr.org|www.nyan.cat)'
reading from file packets, link-type EN10MB (Ethernet), snapshot length 262144
20:23:37.319272 IP madhatter.52135 > _gateway.domain: 52609+ A? stackoverflow.com. (35)
20:23:38.811385 IP madhatter.39144 > _gateway.domain: 15910+ AAAA? www.npr.org. (29)
20:23:38.811423 IP madhatter.52655 > _gateway.domain: 13756+ A? www.npr.org. (29)
20:23:41.214261 IP madhatter.46762 > _gateway.domain: 20587+ AAAA? www.nyan.cat. (30)
20:23:41.214286 IP madhatter.43846 > _gateway.domain: 12335+ A? www.nyan.cat. (30)

huangapple
  • 本文由 发表于 2023年2月18日 04:40:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75489071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定