问题

我正在尝试创建一个使用RQ Redis存储由Playwright库抓取的数据的Flask应用程序。

我尝试的目标是创建一个Playwright变量的全局浏览器实例，这样当不同的用户尝试从Flask获取数据时，将使用相同的实例。但是，通过rq.enqueue将浏览器实例作为参数传递给任务函数时遇到了问题。

使用gevent的monkey.patch_all()似乎不起作用。

我的代码如下：
app.py

from gevent import monkey
monkey.patch_all()

import redis
from rq import Queue
from playwright.sync_api import sync_playwright
from flask import (
    Flask,
    render_template,
    request,
    make_response,
)
from src.flask.utils import (
    return_map_info,
    get_cookie
)

playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=True)
r = redis.Redis(host='localhost')
q = Queue(connection=r)
app = Flask(__name__)

@app.route('/add', methods=('GET', 'POST'))
def add_task():
    """
    This function is used to get the data from the post form
    and then add the task into the redis queue, which data will be
    returned later after the task is complete
    """
    jobs = q.jobs
    message = None
    if request.method == "POST":
        url = request.form['url']
        search_type = request.form['search_type']

        task = q.enqueue(return_map_info,
                         args=(browser,),
                         kwargs={
                             'url': url,
                             'type': search_type
                         })
        job_id = task.id
        cookie_key = get_cookie(request.cookies.get('cookieid'))
        jobs = q.jobs
        q_length = len(q)
        r.hset(cookie_key, url, job_id)

        message = f"The result is {task} and the jobs queued are {q_length}"
        resp = make_response(render_template(
            "add.html", message=message, jobs=jobs))

        resp.set_cookie("cookieid", cookie_key)
        return resp
    return render_template(
        "add.html", message=message, jobs=jobs)

注意：我只提供了代码部分的翻译，如果您需要更多帮助，请随时提问。

英文:

I am trying to create a flask application with rq redis which stores tasks that returns the data scraped by playwright library.

What I am trying to do is to create a global browser instance of playwright variable, so when different users try to get that data from flask, same instance is used. But I encounter a problem while sending the browser instance as an argument to task function through rq.enqueue

using monkey.patch_all() from gevent doesn't seem to work

My code is as follows:
app.py

from gevent import monkey
monkey.patch_all()

import redis
from rq import Queue
from playwright.sync_api import sync_playwright
from flask import (
    Flask,
    render_template,
    request,
    make_response,
)
from src.flask.utils import (
    return_map_info,
    get_cookie
)

playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=True)
r = redis.Redis(host=&#39;localhost&#39;)
q = Queue(connection=r)
app = Flask(__name__)


@app.route(&#39;/add&#39;, methods=(&#39;GET&#39;, &#39;POST&#39;))
def add_task():
    &quot;&quot;&quot;
        This function is used to get the data from the post form
        and then add the task into the redis queue, which data will be
        returned later after the task is complete
    &quot;&quot;&quot;
    jobs = q.jobs
    message = None
    if request.method == &quot;POST&quot;:
        url = request.form[&#39;url&#39;]
        search_type = request.form[&#39;search_type&#39;]

        task = q.enqueue(return_map_info,
                         args=(browser,),
                         kwargs={
                             &#39;url&#39;:url,
                             &#39;type&#39;:search_type
                         })
        job_id = task.id
        cookie_key = get_cookie(request.cookies.get(&#39;cookieid&#39;))
        jobs = q.jobs
        q_length = len(q)
        r.hset(cookie_key, url, job_id)

        message = f&quot;The result is {task} and the jobs queued are {q_length}&quot;
        resp = make_response(render_template(
            &quot;add.html&quot;, message=message, jobs=jobs))

        resp.set_cookie(&quot;cookieid&quot;, cookie_key)
        return resp
    return render_template(
        &quot;add.html&quot;, message=message, jobs=jobs)

答案1

得分: 1

根据我的理解，您想要显示每个用户的抓取结果。如果只是想要这样做，您不需要使用队列，而是找到一种合适的方式来存储抓取的数据，根据用户输入进行筛选，然后将数据发送作为响应。

但是，如果您正在使用RQ工作器来根据用户输入抓取网站，您必须在工作器内初始化playwright实例，运行抓取数据的任务，并将它们存储在数据库中，之后可以按照上述方法使用它们。

RQ是基于Redis的任务队列。独立的RQ工作进程在配置的Redis队列上监听运行。因此，您给作业的任何输入（args、kwargs）都必须可序列化，以便存储在Redis中。而监听的工作器会读取和反序列化数据以获取实际的输入。

英文:

From what I understand, you want to show the scraped results for each and every user. You don't need a queue if you just want to do that, instead find an appropriate way to store the scraped data, filter based on the user input and send the data in the response.

But if you are running RQ worker to scrape sites based on the user input, you have to initiate playwright instance inside the worker, run the job which scrapes the data and store them in a database, which could be later used as mentioned above.

RQ is a task queue based off Redis. Separate RQ worker process is run listening on Redis configured queues. So whatever input(args, kwargs) you give to the job must be serialisable in order to be stored in Redis. And the worker listening reads and deserialises data to get the actual input.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

TypeError: 无法pickle ‘LockType’对象

问题

答案1

在Python tkinter中如何获取用户输入值而不调用输入标签名称或使用get()方法。

从另一个模块导入的访问节点在Python 3中

我的开放阅读框架（ORF）查找代码没有找到序列中最长的ORF。

Tkinter抛出ImportError。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论