英文:
Opening links in txt file in headless browser in python
问题
以下是您要翻译的内容:
import os
import subprocess
import time
import argparse
def read_links_from_file(file_path):
links = []
with open(file_path, 'r') as file:
for line in file:
links.append(line.strip())
return links
def open_links_in_chrome(links, headless=True):
options = '--headless' if headless else ''
for link in links:
subprocess.call(f'google-chrome {options} --app={link}', shell=True)
time.sleep(1)
def run_snort(interface):
subprocess.call(f'snort -i {interface}', shell=True)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--file', help='Path to file containing links', required=True)
parser.add_argument('--interface', help='Network interface for Snort to listen on', required=True)
parser add_argument('--headless', help='Run Chrome in headless mode', action='store_true')
args = parser.parse_args()
file_path = args.file
interface = args.interface
headless = args.headless
links = read_links_from_file(file_path)
snort_process = subprocess.Popen(['snort', '-i', interface])
open_links_in_chrome(links, headless)
snort_process.terminate()
if __name__ == '__main__':
main()
这部分代码是您提供的Python代码,没有翻译内容。
英文:
I have been having a problem running the code below and suspect the problem is with link.strip(). The program is running in a linux environment and it is supposed to open multiple links contained in a text files and opens them for snort to scan for malware. The file name is defined in the terminal before the code is executed.
import os
import subprocess
import time
import argparse
def read_links_from_file(file_path):
links = []
with open(file_path, 'r') as file:
for line in file:
links.append(line.strip())
return links
def open_links_in_chrome(links, headless=True):
options = '--headless' if headless else ''
for link in links:
subprocess.call('google-chrome {options} --app={link}', shell=True)
time.sleep(1)
def run_snort(interface):
subprocess.call(f'snort -i {interface}', shell=True)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--file', help='Path to file containing links', required=True)
parser.add_argument('--interface', help='Network interface for Snort to listen on', required=True)
parser.add_argument('--headless', help='Run Chrome in headless mode', action='store_true')
args = parser.parse_args()
file_path = args.file
interface = args.interface
headless = args.headless
links = read_links_from_file(file_path)
snort_process = subprocess.Popen(['snort', '-i', interface])
open_links_in_chrome(links, headless)
snort_process.terminate()
if __name__ == '__main__':
main()
I tried reconfiguring the applications and rewrote the code but I'm not sure if I preserved the right code but
links.append(line.strip())
doesn't seem to be the right way to go. I have also changed the sleep time from 5 to 1
After some tinkering I ended up with the following error
Acquiring network traffic from "eth0". ERROR: Can't start DAQ (-1) - socket: Operation not permitted! Fatal Error, Quitting.. libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name
= (null) [121024:121024:0217/122814.243731:ERROR:gpu_memory_buffer_support_x11.cc(49)] dri3 extension not supported. [121070:8:0217/122815.025776:ERROR:command_buffer_proxy_impl.cc(128)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer. Fontconfig error: Cannot load default config file: No such file: (null)
答案1
得分: 0
> I have been having a problem running the code below and suspect the problem is with link.strip().
我一直在运行以下代码时遇到问题,怀疑问题出在 link.strip() 上。
I assume you mean line.strip()
(you're not calling link.strip()
anywhere in your code). If you think the code is problematic, let's test it. If I have a file that contains a list of four URLs in file urls.txt
:
我猜你是指 line.strip()
(在你的代码中没有调用 link.strip()
)。如果你认为代码有问题,让我们来测试一下。如果我有一个包含四个URL的文件 urls.txt
:
And then run the following code:
然后运行以下代码:
import sys
def read_links_from_file(file_path):
links = []
with open(file_path, 'r') as file:
for line in file:
links.append(line.strip())
return links
links = read_links_from_file('urls.txt')
for i, link in enumerate(links):
print(f'{i}: {link}')
I get the following output:
我会得到以下输出:
0: https://google.com
1: https://stackoverflow.com
2: https://www.npr.org/programs/wait-wait-dont-tell-me/
3: https://www.nyan.cat/
That suggests your read_links_from_file
function works as expected.
这表明你的 read_links_from_file
函数按预期工作。
On the other hand, you're doing more work than is necessary. The default behavior of a Python file object is to act as an iterator over the lines in the file, so instead of writing this:
另一方面,你做的工作比必要的多。Python文件对象的默认行为是作为文件中行的迭代器,所以不需要编写以下代码:
def read_links_from_file(file_path):
links = []
with open(file_path, 'r') as file:
for line in file:
links.append(line.strip())
return links
links = read_links_from_file(args.file)
open_links_in_chrome(links, args.headless)
You can just delete the read_links_from_file
function and pass the open file:
你可以直接删除 read_links_from_file
函数并传递打开的文件:
with open(args.file) as links:
open_links_in_chome((line.strip() for line in links), args.headless)
I'm cheating a bit here because instead of simply iterating over the file, I'm using a generator expression to take care of stripping the end-of-line character.
我在这里有点作弊,因为我不仅仅是迭代文件,还使用了一个生成器表达式来处理去除行末尾的字符。
You have an error in your open_links_in_chrome
function. You have written:
你的 open_links_in_chrome
函数中有一个错误。你写成了:
subprocess.call('google-chrome {options} --app={link}', shell=True)
This will result in running the literal command line...
这将导致运行字面的命令行...
chrome {options} --app={link}
...because you are neither using a Python f-string nor are you calling the .format()
method. You need to write the function like this in order to run Chrome as expected:
...因为你既没有使用Python的f-string,也没有调用.format()
方法。为了正常运行Chrome,你需要这样写这个函数:
def open_links_in_chrome(links, headless=True):
options = '--headless' if headless else ''
for link in links:
subprocess.call(f'google-chrome {options} --app={link}', shell=True)
time.sleep(1)
This introduces a new problem: this will successfully open Chrome with the first URL, but Chrome will never exit, so your code won't continue past this point.
这引入了一个新问题:这将成功打开Chrome并加载第一个URL,但Chrome永远不会退出,所以你的代码不会继续执行。
Rather than trying to fix this, I would suggest using a browser automation library like Playwright or Selenium. Here's your code rewritten to use Playwright:
与其试图修复这个问题,我建议使用像Playwright或Selenium这样的浏览器自动化库。以下是使用Playwright重写的代码:
import playwright
from playwright.sync_api import sync_playwright
import subprocess
import time
import argparse
import signal
def open_links_in_chrome(links, headless=True):
with sync_playwright() as p:
browser = p.chromium.launch(headless=headless)
page = browser.new_page()
for link in links:
print(f'fetching {link}')
try:
page.goto(link)
except playwright._impl._api_types.TimeoutError:
print(f'{link} timed out.')
time.sleep(1)
browser.close()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--file', help='Path to file containing links', required=True)
parser.add_argument('--interface', help='Network interface for Snort to listen on', required=True)
parser.add_argument('--headless', help='Run Chrome in headless mode', action='store_true')
args = parser.parse_args()
snort_process = subprocess.Popen(['snort', '-i', args.interface])
with open(args.file) as links:
open_links_in_chrome((line.strip() for line in links), headless=args.headless)
snort_process.terminate()
if __name__ == '__main__':
main()
If we run this -- assuming we have followed the Playwright installation instructions -- we see as output:
如果我们运行这个代码 - 假设我们已经按照Playwright的安装说明进行了安装 - 我们会看到以下输出:
fetching https://google.com
fetching https://stackoverflow.com
fetching https://www.npr.org/programs/wait-wait-dont-tell-me/
fetching https://www.nyan.cat/
In my tests I've replaced snort
with tcpdump
, and examining the resulting packet capture I can see that we're making the expected network requests:
在我的测试中,我用
英文:
> I have been having a problem running the code below and suspect the problem is with link.strip().
I assume you mean line.strip()
(you're not calling link.strip()
anywhere in your code). If you think the code is problematic, let's test it. If I have a file that contains a list of four URLs in file urls.txt
:
https://google.com
https://stackoverflow.com
https://www.npr.org/programs/wait-wait-dont-tell-me/
https://www.nyan.cat/
And then run the following code:
import sys
def read_links_from_file(file_path):
links = []
with open(file_path, 'r') as file:
for line in file:
links.append(line.strip())
return links
links = read_links_from_file('urls.txt')
for i, link in enumerate(links):
print(f'{i}: {link}')
I get the following output:
0: https://google.com
1: https://stackoverflow.com
2: https://www.npr.org/programs/wait-wait-dont-tell-me/
3: https://www.nyan.cat/
That suggest your read_links_from_file
function works as expected.
On the other hand, you're doing more work than is necessary. The default behavior of a Python file object is to act as an iterator over the lines in the file, so instead of writing this:
def read_links_from_file(file_path):
links = []
with open(file_path, 'r') as file:
for line in file:
links.append(line.strip())
return links
links = read_links_from_file(args.file)
open_links_in_chrome(links, args.headless)
You can just delete the read_links_from_file
functions and pass the open file:
with open(args.file) as links:
open_links_in_chome((line.strip() for line in links), args.headless)
I'm cheating a bit here because in stead of simply iterating over the file I'm using a generator expression to take care of stripping the end-of-line character.
You have an error in your open_links_in_chrome
function. You have written:
subprocess.call('google-chrome {options} --app={link}', shell=True)
This will result in running the literal command line...
chrome {options} --app={link}
...because you are neither using a Python f-string nor are you calling the .format()
method. You need to write the function like this in order to run Chrome as expected:
def open_links_in_chrome(links, headless=True):
options = '--headless' if headless else ''
for link in links:
subprocess.call(f'google-chrome {options} --app={link}', shell=True)
time.sleep(1)
This introduces a new problem: this will successfully open Chrome with the first URL, but Chrome will never exit, so your code won't continue past this point.
Rather than trying to fix this, I would suggest using a browser automation library like Playwright or Selenium. Here's your code rewritten to use Playwright:
import playwright
from playwright.sync_api import sync_playwright
import subprocess
import time
import argparse
import signal
def open_links_in_chrome(links, headless=True):
with sync_playwright() as p:
browser = p.chromium.launch(headless=headless)
page = browser.new_page()
for link in links:
print(f'fetching {link}')
try:
page.goto(link)
except playwright._impl._api_types.TimeoutError:
print(f'{link} timed out.')
time.sleep(1)
browser.close()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--file', help='Path to file containing links', required=True)
parser.add_argument('--interface', help='Network interface for Snort to listen on', required=True)
parser.add_argument('--headless', help='Run Chrome in headless mode', action='store_true')
args = parser.parse_args()
snort_process = subprocess.Popen(['snort', '-i', args.interface])
with open(args.file) as links:
open_links_in_chrome((line.strip() for line in links), headless=args.headless)
snort_process.terminate()
if __name__ == '__main__':
main()
If we run this -- assuming we have followed the Playwright installation instructions -- we see as output:
fetching https://google.com
fetching https://stackoverflow.com
fetching https://www.npr.org/programs/wait-wait-dont-tell-me/
fetching https://www.nyan.cat/
In my tests I've replaced snort
with tcpdump
, and examining the resulting packet capture I can see that we're making the expected network requests:
$ tcpdump -r packets port 53 | grep -E 'A\? (google.com|stackoverflow.com|www.npr.org|www.nyan.cat)'
reading from file packets, link-type EN10MB (Ethernet), snapshot length 262144
20:23:37.319272 IP madhatter.52135 > _gateway.domain: 52609+ A? stackoverflow.com. (35)
20:23:38.811385 IP madhatter.39144 > _gateway.domain: 15910+ AAAA? www.npr.org. (29)
20:23:38.811423 IP madhatter.52655 > _gateway.domain: 13756+ A? www.npr.org. (29)
20:23:41.214261 IP madhatter.46762 > _gateway.domain: 20587+ AAAA? www.nyan.cat. (30)
20:23:41.214286 IP madhatter.43846 > _gateway.domain: 12335+ A? www.nyan.cat. (30)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论