问题

我对代码的某一部分速度较慢有问题。

我认为这是因为迭代遍历数据帧而导致的。
以下是代码：

# 为所有数据创建一个数据帧
df_all = pd.DataFrame() 

for idx, x in enumerate(all_data[0]):
    
    peak_indx_E = ...
    ...
   
    # TODO：加速！
    # 是因为这个操作速度慢吗？如果我需要输出一个数据帧，如何避免这个问题？
    
    temp = pd.DataFrame(
      {
        'idx_global_num': idx, 
        ...
        'peak_sq_divE': peak_sq_divE
      }, index=[idx]
    )
    df_all = pd.concat([df_all, temp])

你能给我一些建议吗？如何加快执行速度？我认为pd.concat操作速度较慢。

如何解决这个问题？

英文:

I have an issue with the part of code which seems to work slowly.

I suppose it's because of iterating through a dataframe.
Here is the code:

# creating a dataframe for ALL data
df_all = pd.DataFrame() 

for idx, x in enumerate(all_data[0]):
    
    peak_indx_E = ...
    ...
   
    # TODO: speed up!
    # it works slow because of this? How to avoid this problem if I need to output a dataframe
    
    temp = pd.DataFrame(
      {
        &#39;idx_global_num&#39;: idx, 
        ...
        &#39;peak_sq_divE&#39;: peak_sq_divE
      }, index=[idx]
    )
    df_all = pd.concat([df_all, temp])

Can you give me a suggestion - how can I speed up the execution - I suppose the pd.concat operation is slow.

How to solve this issue?

答案1

得分: 1

看起来你每次迭代都在构建两个panda数据框对象。
相反，你应该在迭代过程中构建列表或字典列表，然后在迭代结束时使用它来创建数据框。

示例：

df_list = []

for idx, x in enumerate(all_data[0]):
    df_list.append(
        {
            'idx_global_num': idx, 
            ...
            'peak_sq_divE': peak_sq_divE
        }
    )

df_all = pd.DataFrame(df_list)

英文:

It looks like you're building two panda dataframe objects for each iteration.
Instead, you should build list or list of dicts during the iteration, and use that to create the dataframe when you're done iterating.

Example:

df_list = []

for idx, x in enumerate(all_data[0]):
    df_list.append(
        {
            &#39;idx_global_num&#39;: idx, 
            ...
            &#39;peak_sq_divE&#39;: peak_sq_divE
        }
    )


df_all = pd.DataFrame.from_dict(df_list)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

非常慢的数据框处理，如何避免

问题

答案1

Tkinter问题：在可滚动的画布中获取多个滑块的值

在Python中，函数的数组参数在连续调用之间存储在何处？

计算由connectedComponents生成的所有聚类的平均颜色。

numpy的逻辑逐元素操作在pandas 2.0中是否出现问题？（np.logical_or）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论