为什么我的代码的并行版本比串行版本运行得更慢?

huangapple go评论68阅读模式
英文:

Why is the parallel version of my code slower than the serial one?

问题

以下是翻译的内容:

I am trying to run a model multiple times. As a result it is time consuming. As a solution I try to make it parallel. However, it ends up to be slower. Parallel is 40 seconds while serial is 34 seconds.

# !pip install --target=$nb_path transformers
oracle = pipeline(model="deepset/roberta-base-squad2")
question = 'When did the first extension of the Athens Tram take place?'
print(data)
print("Data size is: ", len(data))

parallel = True

if parallel == False:
  counter = 0
  l = len(data)
  cr = []
  for words in data:
    counter+=1
    print(counter, " out of ", l)
    cr.append(oracle(question=question, context=words))
elif parallel == True:
  from multiprocessing import Process, Queue
  import multiprocessing

  no_CPU = multiprocessing.cpu_count()
  print("Number of cpu : ", no_CPU)
  l = len(data)

  def answer_question(data, no_CPU, sub_no):
    cr_process = []
    counter_process = 0
    for words in data:
      counter_process+=1
      l_data = len(data)
      print(counter_process, " out of ", l_data, "in subprocess number", sub_no)
      cr_process.append(oracle(question=question, context=words))
    cr.append(cr_process)

  n = no_CPU      # number of subprocesses
  m = l//n        # number of data the n-1 first subprocesses will handle
  res = l % n     # number of extra data samples the last subprocesses has
  
  procs = []
  for x in range(n-1):
    proc = Process(target=answer_question, args=(data[x*m:(x+1)*m],n, x+1,))
    procs.append(proc)
    proc.start()
  proc = Process(target=answer_question, args=(data[(n-1)*m:n*m+res],n,n,))

  procs.append(proc)
  proc.start()

  for proc in procs:
    proc.join()

关于data变量的一个样本可以在这里找到(为了避免问题的淹没)。参数parallel控制串行和并行版本。所以我的问题是,为什么会发生这种情况,以及如何使并行版本更快?我在谷歌Colab上使用,所以至少有2个CPU核心可用,这是通过multiprocessing.cpu_count()得出的信息。

英文:

I am trying to run a model multiple times. As a result it is time consuming. As a solution I try to make it parallel. However, it ends up to be slower. Parallel is 40 seconds while serial is 34 seconds.

# !pip install --target=$nb_path transformers
oracle = pipeline(model="deepset/roberta-base-squad2")
question = 'When did the first extension of the Athens Tram take place?'
print(data)
print("Data size is: ", len(data))


parallel = True

if parallel == False:
  counter = 0
  l = len(data)
  cr = []
  for words in data:
    counter+=1
    print(counter, " out of ", l)
    cr.append(oracle(question=question, context=words))
elif parallel == True:
  from multiprocessing import Process, Queue
  import multiprocessing

  no_CPU = multiprocessing.cpu_count()
  print("Number of cpu : ", no_CPU)
  l = len(data)


  def answer_question(data, no_CPU, sub_no):
    cr_process = []
    counter_process = 0
    for words in data:
      counter_process+=1
      l_data = len(data)
      # print("n is", no_CPU)
      # print("l is", l_data)
      print(counter_process, " out of ", l_data, "in subprocess number", sub_no)
      cr_process.append(oracle(question=question, context=words))
      # Q.put(cr_process)
    cr.append(cr_process)


  n = no_CPU      # number of subprocesses
  m = l//n        # number of data the n-1 first subprocesses will handle
  res = l % n     # number of extra data samples the last subprocesses has
  
  # print(m)
  # print(res)
  procs = []
  # instantiating process with arguments
  for x in range(n-1):
    # print(x*m)
    # print((x+1)*m)
    proc = Process(target=answer_question, args=(data[x*m:(x+1)*m],n, x+1,))
    procs.append(proc)
    proc.start()
  proc = Process(target=answer_question, args=(data[(n-1)*m:n*m+res],n,n,))

  procs.append(proc)
  proc.start()

  # complete the processes
  for proc in procs:
    proc.join()

A sample of the data variable can be found here (to not flood the question). Argument parallel controls the serial and the parallel version. So my question is, why does it happen and how do I make the parallel version faster? I use google colab so it has 2 CPU cores available , that's what multiprocessing.cpu_count() is saying at least.

答案1

得分: 2

你的流水线已经在多CPU上运行,即使作为一个进程运行。transformers的代码经过优化,可在多CPU上运行。当你创建多个进程时,会花费一些时间来构建进程并在它们之间进行切换。

要验证这一点,在所谓的“单进程”版本中查看你的CPU利用率,你应该已经看到所有的利用率都达到最大,因此创建额外的并行进程不会为你节省时间。

英文:

Your pipeline is already running on multi-cpu even when run as one process. The code of transformers are optimized to run on multi-cpu.
when on top of that you are creating multiple process, you are loosing some time for building the processes and switching between them.

To verify this, on the so-called "single process" version look at your cpu utilizations, you should already see all are at max, so creating extra parallel processes are not going to save you some time,

huangapple
  • 本文由 发表于 2023年2月13日 23:57:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75438284.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定