非常慢的数据框处理,如何避免

huangapple go评论59阅读模式
英文:

very slow work with dataframe, how to avoid

问题

我对代码的某一部分速度较慢有问题。

我认为这是因为迭代遍历数据帧而导致的。
以下是代码:

# 为所有数据创建一个数据帧
df_all = pd.DataFrame() 

for idx, x in enumerate(all_data[0]):
    
    peak_indx_E = ...
    ...
   
    # TODO:加速!
    # 是因为这个操作速度慢吗?如果我需要输出一个数据帧,如何避免这个问题?
    
    temp = pd.DataFrame(
      {
        'idx_global_num': idx, 
        ...
        'peak_sq_divE': peak_sq_divE
      }, index=[idx]
    )
    df_all = pd.concat([df_all, temp])

你能给我一些建议吗?如何加快执行速度?我认为pd.concat操作速度较慢。

如何解决这个问题?

英文:

I have an issue with the part of code which seems to work slowly.

I suppose it's because of iterating through a dataframe.
Here is the code:

# creating a dataframe for ALL data
df_all = pd.DataFrame() 

for idx, x in enumerate(all_data[0]):
    
    peak_indx_E = ...
    ...
   
    # TODO: speed up!
    # it works slow because of this? How to avoid this problem if I need to output a dataframe
    
    temp = pd.DataFrame(
      {
        'idx_global_num': idx, 
        ...
        'peak_sq_divE': peak_sq_divE
      }, index=[idx]
    )
    df_all = pd.concat([df_all, temp])

Can you give me a suggestion - how can I speed up the execution - I suppose the pd.concat operation is slow.

How to solve this issue?

答案1

得分: 1

看起来你每次迭代都在构建两个panda数据框对象。
相反,你应该在迭代过程中构建列表或字典列表,然后在迭代结束时使用它来创建数据框。

示例:

df_list = []

for idx, x in enumerate(all_data[0]):
    df_list.append(
        {
            'idx_global_num': idx, 
            ...
            'peak_sq_divE': peak_sq_divE
        }
    )

df_all = pd.DataFrame(df_list)
英文:

It looks like you're building two panda dataframe objects for each iteration.
Instead, you should build list or list of dicts during the iteration, and use that to create the dataframe when you're done iterating.

Example:

df_list = []

for idx, x in enumerate(all_data[0]):
    df_list.append(
        {
            'idx_global_num': idx, 
            ...
            'peak_sq_divE': peak_sq_divE
        }
    )


df_all = pd.DataFrame.from_dict(df_list)

huangapple
  • 本文由 发表于 2023年2月8日 11:31:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381130.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定