Python多进程在for循环上的应用

huangapple go评论78阅读模式
英文:

Python multi processing on for loop

问题

我有一个带有两个参数的函数

reqs = [1223, 1456, 1243, 20455]
url = "传递一个URL"
def crawl(i, url):
    print("%s%s" % (i, url))

我想通过多进程的概念触发上面的函数

from multiprocessing import Pool

if __name__ == '__main__':
    p = Pool(5)
    print(p.map([crawl(i, url) for i in reqs]))

上面的代码对我不起作用有人能帮我吗

----- 添加新的代码 -----

from multiprocessing import Pool

reqs = [1223, 1456, 1243, 20455]
url = "传递一个URL"

def crawl(combined_args):
    print("%s%s" % (combined_args[0], combined_args[1]))

def main():
    p = Pool(5)
    print(p.map(crawl, [(i, url) for i in reqs]))

if __name__ == '__main__':
    main()

当我尝试执行上面的代码时我得到以下错误

请注意,我没有翻译代码中的变量名和函数名,只翻译了注释和字符串。

英文:

I have a function with two parameters

reqs =[1223,1456,1243,20455]
url = "pass a url"
def crawl(i,url):
   print("%s is %s" % (i, url))

I want to trigger above function by multi processing concept.

from multiprocessing import Pool

if __name__ == '__main__':
    p = Pool(5)   
    print(p.map([crawl(i,url) for i in reqs]))

above code is not working for me. can anyone please help me on this!

----- ADDING NEW CODE ---------

from multiprocessing import Pool

reqs = [1223,1456,1243,20455]
url = "pass a url"

def crawl(combined_args):
   print("%s is %s" % (combined_args[0], combined_args[1]))

def main():
    p = Pool(5)   
    print(p.map(crawl, [(i,url) for i in reqs]))

if __name__ == '__main__':
    main()

when I am trying to execute above code, I am getting below error

Python多进程在for循环上的应用

答案1

得分: 2

根据multiprocessing.Pool.map,这是函数参数行:

map(func, iterable[, chunksize])

您试图将迭代器传递给map,而不是(func, iterable)

请参考以下的multiprocessing.pool示例(源代码):

import time
from multiprocessing import Pool

work = (["A", 5], ["B", 2], ["C", 1], ["D", 3])

def work_log(work_data):
    print(" Process %s waiting %s seconds" % (work_data[0], work_data[1]))
    time.sleep(int(work_data[1]))
    print(" Process %s Finished." % work_data[0])

def pool_handler():
    p = Pool(2)
    p.map(work_log, work)

if __name__ == '__main__':
    pool_handler()

请注意,他在work_log函数中传递了一个参数,并在函数中使用索引来获取相关字段。


关于您的示例:

from multiprocessing import Pool

reqs = [1223, 1456, 1243, 20455]
url = "pass a url"

def crawl(combined_args):
   print("%s is %s" % (combined_args[0], combined_args[1]))

def main():
    p = Pool(5)
    print(p.map(crawl, [(i, url) for i in reqs]))

if __name__ == '__main__':
    main()

结果为:

1223 is pass a url
1456 is pass a url
1243 is pass a url
20455 is pass a url
[None, None, None, None]  # 这是map函数的输出
英文:

According to the multiprocessing.Pool.map this is the function argument line:

map(func, iterable[, chunksize])

You are trying to pass to the map a iterator instead of (func, iterable).

Please refer to the following example of multiprocessing.pool (source):

import time
from multiprocessing import Pool

work = (["A", 5], ["B", 2], ["C", 1], ["D", 3])

def work_log(work_data):
    print(" Process %s waiting %s seconds" % (work_data[0], work_data[1]))
    time.sleep(int(work_data[1]))
    print(" Process %s Finished." % work_data[0])

def pool_handler():
    p = Pool(2)
    p.map(work_log, work)


if __name__ == '__main__':
    pool_handler()

Please note that he is passing one argument to the work_log function and in the function he use the index to get to the relevant fields.


Refering to your example:

from multiprocessing import Pool

reqs = [1223,1456,1243,20455]
url = "pass a url"

def crawl(combined_args):
   print("%s is %s" % (combined_args[0], combined_args[1]))

def main():
    p = Pool(5)   
    print(p.map(crawl, [(i,url) for i in reqs]))

if __name__ == '__main__':
    main()

Results with:

1223 is pass a url
1456 is pass a url
1243 is pass a url
20455 is pass a url
[None, None, None, None]  # This is the output of the map function

答案2

得分: 1

问题已解决。爬虫函数应该在单独的模块中,如下所示:

crawler.py

def crawl(combined_args):
    print("%s is %s" % (combined_args[0], combined_args[1]))

run.py

from multiprocessing import Pool
import crawler

def main():
    p = Pool(5)
    print(p.map(crawler.crawl, [(i, url) for i in reqs]))

if __name__ == '__main__':
    main()

然后输出将如下所示:

output :

1223 is pass a url
1456 is pass a url
1243 is pass a url
20455 is pass a url
[None, None, None, None]  # 这是map函数的输出
英文:

Issue resolved. crawl function should in separate module like below:

crawler.py

def crawl(combined_args):
   print("%s is %s" % (combined_args[0], combined_args[1]))

run.py

from multiprocessing import Pool
import crawler

def main():
    p = Pool(5)   
    print(p.map(crawler.crawl, [(i,url) for i in reqs]))

if __name__ == '__main__':
    main()

Then output will be like below:

**output :**

1223 is pass a url
1456 is pass a url
1243 is pass a url
20455 is pass a url
[None, None, None, None]  # This is the output of the map function

huangapple
  • 本文由 发表于 2020年1月6日 20:00:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/59611745.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定