使用多进程时,如果函数返回值的类型错误,会出现问题。

huangapple go评论65阅读模式
英文:

Function return value of wrong type if I use multiprocessing

问题

Here is the translated code:

世界
我有这段代码

```python
from numpy import array
from cv2 import imshow, cvtColor, imwrite, imread, destroyAllWindows, COLOR_BGR2RGB
from pyscreenshot import grab
import pytesseract

filename = 'image.png'
elements_for_replace = {'iL': '1L', 'Bi': 'B1', 'Bl': 'B1', 'Ci': 'C1', 'Cl': 'C1'}

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\Administrator\AppData\Local\Tesseract-OCR\tesseract.exe'

def scanning(x1, y1, x2, y2):
    screen = array(grab(bbox=(x1, y1, x2, y2)))
    imwrite(filename, screen)
    img = imread(filename)
    text = pytesseract.image_to_string(img)
    history = text.split()
    return history

def first():
    return scanning(730, 740, 1335, 790)

def second():
    return scanning(730, 453, 1335, 500)

def third():
    return scanning(817, 45, 1522, 99)

def replace_elements(data, replace_data):
    for item in data:
        if item in replace_data:
            data[data.index(item)] = replace_data[item]
    return data

def get_data():
    x = replace_elements(first(), elements_for_replace)
    y = replace_elements(second(), elements_for_replace)
    z = replace_elements(third(), elements_for_replace)
    destroyAllWindows()
    return x, y, z

当调用get_data()函数时,此代码使用计算机视觉将图像转换为文本,位于屏幕上的三个不同位置。然后替换失败的元素。最终,我们得到一个元组的列表(x, y, z),将由程序的另一部分处理。

将图像转换为文本需要很多时间。程序的顺序执行方式将这个时间乘以3。我得出结论,需要使用多进程模块(或者更确切地说是concurrent.futures)来减少程序的执行时间。

我将get_data()函数重写如下:

import concurrent.futures

def get_data():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        x = executor.submit(replace_elements, first(), elements_for_replace)
        y = executor.submit(replace_elements, second(), elements_for_replace)
        z = executor.submit(replace_elements, third(), elements_for_replace)
    destroyAllWindows()
    return x, y, z

现在返回的变量的数据类型是<class 'concurrent.futures._base.Future'>,而不是<class 'list'>,尝试处理此数据的程序会引发错误"TypeError: 'Future' object is not subscriptable"。

要以与代码的第一个版本相同的方式启动函数的并行执行,从而返回值仍然是<class 'list'>,您可以使用.result() 方法来获取Future 对象的实际结果。例如:

def get_data():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        x = executor.submit(replace_elements, first(), elements_for_replace).result()
        y = executor.submit(replace_elements, second(), elements_for_replace).result()
        z = executor.submit(replace_elements, third(), elements_for_replace).result()
    destroyAllWindows()
    return x, y, z

这将等待每个Future 对象的结果并将其转换为列表类型。

英文:

world!
I have this code:

from numpy import array
from cv2 import imshow, cvtColor, imwrite, imread, destroyAllWindows, COLOR_BGR2RGB
from pyscreenshot import grab
import pytesseract


filename = &#39;image.png&#39;
elements_for_replace = {&#39;iL&#39;: &#39;1L&#39;, &#39;Bi&#39;: &#39;B1&#39;, &#39;Bl&#39;: &#39;B1&#39;, &#39;Ci&#39;: &#39;C1&#39;, &#39;Cl&#39;: &#39;C1&#39;}

pytesseract.pytesseract.tesseract_cmd = r&#39;C:\Users\Administrator\AppData\Local\Tesseract-OCR\tesseract.exe&#39;


def scanning(x1, y1, x2, y2):
    screen = array(grab(bbox=(x1, y1, x2, y2)))
    imwrite(filename, screen)
    img = imread(filename)
    text = pytesseract.image_to_string(img)
    history = text.split()
    return history


def first():
    return scanning(730, 740, 1335, 790)


def second():
    return scanning(730, 453, 1335, 500)


def third():
    return scanning(817, 45, 1522, 99)


def replace_elements(data, replace_data):
    for item in data:
        if item in replace_data:
            data[data.index(item)] = replace_data[item]
    return data


def get_data():
    x = replace_elements(first(), elements_for_replace)
    y = replace_elements(second(), elements_for_replace)
    z = replace_elements(third(), elements_for_replace)
    destroyAllWindows()
    return x, y, z

When the function get_data() is called, this code uses computer vision to translate an image into text at three different locations on the screen. Does it consistently. It then replaces the failed elements with the correct ones. At the output, we get a tuple of lists (x, y, z), which will be processed by another part of the program.

Converting images to text takes a lot of time. And the sequential execution of the program multiplies this time by 3. I came to the conclusion that I need to use the multiprocessing module (or rather concurrent.futures) to reduce the program execution time.

I rewrote the function get_data() like this:

import concurrent.futures


def get_data():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        x = executor.submit(replace_elements, first(), elements_for_replace)
        y = executor.submit(replace_elements, second(), elements_for_replace)
        z = executor.submit(replace_elements, third(), elements_for_replace)
    destroyAllWindows()
    return x, y, z

Now the returned variables have data type <class 'concurrent.futures._base.Future'> instead of <class 'list'> and the program, trying to process this data, throws an error 'TypeError: 'Future' object is not subscriptable'.

How to start parallel execution of a function so that the return value of the function is the same as in the first version of the code, that is <class 'list'> ???

答案1

得分: 2

executor.submit() 返回一个Future对象,而不是被调用函数的返回值。为了获取函数返回的值,你必须在Future对象上调用result()。在你的情况下,你需要修改你的代码如下:

def get_data():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        future_x = executor.submit(replace_elements, first(), elements_for_replace)
        future_y = executor.submit(replace_elements, second(), elements_for_replace)
        future_z = executor.submit(replace_elements, third(), elements_for_replace)
    destroyAllWindows()

    # 实际上获取函数的返回值
    x = future_x.result()
    y = future_y.result()
    z = future_z.result()

    return x, y, z

另外,作为建议,你也可以考虑使用ProcessPoolExecutor.map()来简化代码。使用它,你不需要定义每个结果。

英文:

executor.submit() returns a Future object, not the value of the function called. In order to get the value returned by the function, you must call result() on the Future object. In your case you'll want to modify your code like so:

def get_data():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        future_x = executor.submit(replace_elements, first(), elements_for_replace)
        future_y = executor.submit(replace_elements, second(), elements_for_replace)
        future_z = executor.submit(replace_elements, third(), elements_for_replace)
    destroyAllWindows()

    # Actually get the value of the function here
    x = future_x.result()
    y = future_y.result()
    z = future_z.result()

    return x, y, z

Additionally, as a suggestion you could also look into using the ProcessPoolExecutor.map() to clean up the code a bit. With it, you wouldn't have to define each result.

huangapple
  • 本文由 发表于 2023年3月21日 01:47:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793631.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定