英文:
very slow work with dataframe, how to avoid
问题
我对代码的某一部分速度较慢有问题。
我认为这是因为迭代遍历数据帧而导致的。
以下是代码:
# 为所有数据创建一个数据帧
df_all = pd.DataFrame()
for idx, x in enumerate(all_data[0]):
peak_indx_E = ...
...
# TODO:加速!
# 是因为这个操作速度慢吗?如果我需要输出一个数据帧,如何避免这个问题?
temp = pd.DataFrame(
{
'idx_global_num': idx,
...
'peak_sq_divE': peak_sq_divE
}, index=[idx]
)
df_all = pd.concat([df_all, temp])
你能给我一些建议吗?如何加快执行速度?我认为pd.concat操作速度较慢。
如何解决这个问题?
英文:
I have an issue with the part of code which seems to work slowly.
I suppose it's because of iterating through a dataframe.
Here is the code:
# creating a dataframe for ALL data
df_all = pd.DataFrame()
for idx, x in enumerate(all_data[0]):
peak_indx_E = ...
...
# TODO: speed up!
# it works slow because of this? How to avoid this problem if I need to output a dataframe
temp = pd.DataFrame(
{
'idx_global_num': idx,
...
'peak_sq_divE': peak_sq_divE
}, index=[idx]
)
df_all = pd.concat([df_all, temp])
Can you give me a suggestion - how can I speed up the execution - I suppose the pd.concat operation is slow.
How to solve this issue?
答案1
得分: 1
看起来你每次迭代都在构建两个panda数据框对象。
相反,你应该在迭代过程中构建列表或字典列表,然后在迭代结束时使用它来创建数据框。
示例:
df_list = []
for idx, x in enumerate(all_data[0]):
df_list.append(
{
'idx_global_num': idx,
...
'peak_sq_divE': peak_sq_divE
}
)
df_all = pd.DataFrame(df_list)
英文:
It looks like you're building two panda dataframe objects for each iteration.
Instead, you should build list or list of dicts during the iteration, and use that to create the dataframe when you're done iterating.
Example:
df_list = []
for idx, x in enumerate(all_data[0]):
df_list.append(
{
'idx_global_num': idx,
...
'peak_sq_divE': peak_sq_divE
}
)
df_all = pd.DataFrame.from_dict(df_list)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论