问题

以下是翻译的内容：

I am trying to run a model multiple times. As a result it is time consuming. As a solution I try to make it parallel. However, it ends up to be slower. Parallel is 40 seconds while serial is 34 seconds.

# !pip install --target=$nb_path transformers
oracle = pipeline(model="deepset/roberta-base-squad2")
question = 'When did the first extension of the Athens Tram take place?'
print(data)
print("Data size is: ", len(data))
parallel = True
if parallel == False:
  counter = 0
  l = len(data)
  cr = []
  for words in data:
    counter+=1
    print(counter, " out of ", l)
    cr.append(oracle(question=question, context=words))
elif parallel == True:
  from multiprocessing import Process, Queue
  import multiprocessing
  no_CPU = multiprocessing.cpu_count()
  print("Number of cpu : ", no_CPU)
  l = len(data)
  def answer_question(data, no_CPU, sub_no):
    cr_process = []
    counter_process = 0
    for words in data:
      counter_process+=1
      l_data = len(data)
      print(counter_process, " out of ", l_data, "in subprocess number", sub_no)
      cr_process.append(oracle(question=question, context=words))
    cr.append(cr_process)
  n = no_CPU      # number of subprocesses
  m = l//n        # number of data the n-1 first subprocesses will handle
  res = l % n     # number of extra data samples the last subprocesses has
  
  procs = []
  for x in range(n-1):
    proc = Process(target=answer_question, args=(data[x*m:(x+1)*m],n, x+1,))
    procs.append(proc)
    proc.start()
  proc = Process(target=answer_question, args=(data[(n-1)*m:n*m+res],n,n,))
  procs.append(proc)
  proc.start()
  for proc in procs:
    proc.join()

关于data变量的一个样本可以在这里找到（为了避免问题的淹没）。参数parallel控制串行和并行版本。所以我的问题是，为什么会发生这种情况，以及如何使并行版本更快？我在谷歌Colab上使用，所以至少有2个CPU核心可用，这是通过multiprocessing.cpu_count()得出的信息。

英文:

# !pip install --target=$nb_path transformers
oracle = pipeline(model=&quot;deepset/roberta-base-squad2&quot;)
question = &#39;When did the first extension of the Athens Tram take place?&#39;
print(data)
print(&quot;Data size is: &quot;, len(data))
parallel = True
if parallel == False:
  counter = 0
  l = len(data)
  cr = []
  for words in data:
    counter+=1
    print(counter, &quot; out of &quot;, l)
    cr.append(oracle(question=question, context=words))
elif parallel == True:
  from multiprocessing import Process, Queue
  import multiprocessing
  no_CPU = multiprocessing.cpu_count()
  print(&quot;Number of cpu : &quot;, no_CPU)
  l = len(data)
  def answer_question(data, no_CPU, sub_no):
    cr_process = []
    counter_process = 0
    for words in data:
      counter_process+=1
      l_data = len(data)
      # print(&quot;n is&quot;, no_CPU)
      # print(&quot;l is&quot;, l_data)
      print(counter_process, &quot; out of &quot;, l_data, &quot;in subprocess number&quot;, sub_no)
      cr_process.append(oracle(question=question, context=words))
      # Q.put(cr_process)
    cr.append(cr_process)
  n = no_CPU      # number of subprocesses
  m = l//n        # number of data the n-1 first subprocesses will handle
  res = l % n     # number of extra data samples the last subprocesses has
  
  # print(m)
  # print(res)
  procs = []
  # instantiating process with arguments
  for x in range(n-1):
    # print(x*m)
    # print((x+1)*m)
    proc = Process(target=answer_question, args=(data[x*m:(x+1)*m],n, x+1,))
    procs.append(proc)
    proc.start()
  proc = Process(target=answer_question, args=(data[(n-1)*m:n*m+res],n,n,))
  procs.append(proc)
  proc.start()
  # complete the processes
  for proc in procs:
    proc.join()

A sample of the data variable can be found here (to not flood the question). Argument parallel controls the serial and the parallel version. So my question is, why does it happen and how do I make the parallel version faster? I use google colab so it has 2 CPU cores available , that's what multiprocessing.cpu_count() is saying at least.

答案1

得分: 2

你的流水线已经在多CPU上运行，即使作为一个进程运行。transformers的代码经过优化，可在多CPU上运行。当你创建多个进程时，会花费一些时间来构建进程并在它们之间进行切换。

要验证这一点，在所谓的“单进程”版本中查看你的CPU利用率，你应该已经看到所有的利用率都达到最大，因此创建额外的并行进程不会为你节省时间。

英文:

Your pipeline is already running on multi-cpu even when run as one process. The code of transformers are optimized to run on multi-cpu.
when on top of that you are creating multiple process, you are loosing some time for building the processes and switching between them.

To verify this, on the so-called "single process" version look at your cpu utilizations, you should already see all are at max, so creating extra parallel processes are not going to save you some time,

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么我的代码的并行版本比串行版本运行得更慢？

问题

答案1

可以在pip包内启动docker-compose吗？

Django ORM 忽略 FilteredRelation 条件

有没有办法在Jupyter Notebook中绘制bussproof风格的树形图？

如何使用Gridspec创建子图

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。