TypeError: 无法pickle ‘LockType’对象

huangapple go评论89阅读模式
英文:

TypeError: cannot pickle 'LockType' object

问题

我正在尝试创建一个使用RQ Redis存储由Playwright库抓取的数据的Flask应用程序。

我尝试的目标是创建一个Playwright变量的全局浏览器实例,这样当不同的用户尝试从Flask获取数据时,将使用相同的实例。但是,通过rq.enqueue将浏览器实例作为参数传递给任务函数时遇到了问题。

使用gevent的monkey.patch_all()似乎不起作用。

我的代码如下:
app.py

from gevent import monkey
monkey.patch_all()

import redis
from rq import Queue
from playwright.sync_api import sync_playwright
from flask import (
    Flask,
    render_template,
    request,
    make_response,
)
from src.flask.utils import (
    return_map_info,
    get_cookie
)

playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=True)
r = redis.Redis(host='localhost')
q = Queue(connection=r)
app = Flask(__name__)

@app.route('/add', methods=('GET', 'POST'))
def add_task():
    """
    This function is used to get the data from the post form
    and then add the task into the redis queue, which data will be
    returned later after the task is complete
    """
    jobs = q.jobs
    message = None
    if request.method == "POST":
        url = request.form['url']
        search_type = request.form['search_type']

        task = q.enqueue(return_map_info,
                         args=(browser,),
                         kwargs={
                             'url': url,
                             'type': search_type
                         })
        job_id = task.id
        cookie_key = get_cookie(request.cookies.get('cookieid'))
        jobs = q.jobs
        q_length = len(q)
        r.hset(cookie_key, url, job_id)

        message = f"The result is {task} and the jobs queued are {q_length}"
        resp = make_response(render_template(
            "add.html", message=message, jobs=jobs))

        resp.set_cookie("cookieid", cookie_key)
        return resp
    return render_template(
        "add.html", message=message, jobs=jobs)

注意:我只提供了代码部分的翻译,如果您需要更多帮助,请随时提问。

英文:

I am trying to create a flask application with rq redis which stores tasks that returns the data scraped by playwright library.

What I am trying to do is to create a global browser instance of playwright variable, so when different users try to get that data from flask, same instance is used. But I encounter a problem while sending the browser instance as an argument to task function through rq.enqueue

using monkey.patch_all() from gevent doesn't seem to work

My code is as follows:
app.py

from gevent import monkey
monkey.patch_all()

import redis
from rq import Queue
from playwright.sync_api import sync_playwright
from flask import (
    Flask,
    render_template,
    request,
    make_response,
)
from src.flask.utils import (
    return_map_info,
    get_cookie
)

playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=True)
r = redis.Redis(host='localhost')
q = Queue(connection=r)
app = Flask(__name__)


@app.route('/add', methods=('GET', 'POST'))
def add_task():
    """
        This function is used to get the data from the post form
        and then add the task into the redis queue, which data will be
        returned later after the task is complete
    """
    jobs = q.jobs
    message = None
    if request.method == "POST":
        url = request.form['url']
        search_type = request.form['search_type']

        task = q.enqueue(return_map_info,
                         args=(browser,),
                         kwargs={
                             'url':url,
                             'type':search_type
                         })
        job_id = task.id
        cookie_key = get_cookie(request.cookies.get('cookieid'))
        jobs = q.jobs
        q_length = len(q)
        r.hset(cookie_key, url, job_id)

        message = f"The result is {task} and the jobs queued are {q_length}"
        resp = make_response(render_template(
            "add.html", message=message, jobs=jobs))

        resp.set_cookie("cookieid", cookie_key)
        return resp
    return render_template(
        "add.html", message=message, jobs=jobs)

答案1

得分: 1

根据我的理解,您想要显示每个用户的抓取结果。如果只是想要这样做,您不需要使用队列,而是找到一种合适的方式来存储抓取的数据,根据用户输入进行筛选,然后将数据发送作为响应。

但是,如果您正在使用RQ工作器来根据用户输入抓取网站,您必须在工作器内初始化playwright实例,运行抓取数据的任务,并将它们存储在数据库中,之后可以按照上述方法使用它们。

RQ是基于Redis的任务队列。独立的RQ工作进程在配置的Redis队列上监听运行。因此,您给作业的任何输入(args、kwargs)都必须可序列化,以便存储在Redis中。而监听的工作器会读取和反序列化数据以获取实际的输入。

英文:

From what I understand, you want to show the scraped results for each and every user. You don't need a queue if you just want to do that, instead find an appropriate way to store the scraped data, filter based on the user input and send the data in the response.

But if you are running RQ worker to scrape sites based on the user input, you have to initiate playwright instance inside the worker, run the job which scrapes the data and store them in a database, which could be later used as mentioned above.

RQ is a task queue based off Redis. Separate RQ worker process is run listening on Redis configured queues. So whatever input(args, kwargs) you give to the job must be serialisable in order to be stored in Redis. And the worker listening reads and deserialises data to get the actual input.

huangapple
  • 本文由 发表于 2023年6月27日 21:40:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76565483.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定