DataFrame 操作在循环中非常低效,不知道如何修复它。

huangapple go评论56阅读模式
英文:

DataFrame Operation very ineficient with loop, don't know how to fix it

问题

我有这样一个循环,从一个名为 df_labels 的 DataFrame 中获取问题,看起来像是对于会话 21100511290882536:

"session_id"      "question" "correct"
21100511290882536	1	1
21100511290882536	2	1
21100511290882536	3	1
21100511290882536	4	1
21100511290882536	5	0
21100511290882536	6	1
21100511290882536	7	1
21100511290882536	8	0
21100511290882536	9	1
21100511290882536	10	1
21100511290882536	11	1
21100511290882536	12	1
21100511290882536	13	0
21100511290882536	14	1
21100511290882536	15	1
21100511290882536	16	1
21100511290882536	17	1
21100511290882536	18	1

而我想要将其和所有其他会话转换成一个如下的 DataFrame:

"session_id"	   "q_1"   "q_2"  "q_3"   "q_4"	...
21100511290882536    1        1     1       1   ...

我已经有一个名为 "df_sessions" 的 DataFrame,列出了所有的会话。

for session in df_sessions.session_id:
    for i in range(1, 19):
        df_sessions[f'q_{i}'][df_sessions['session_id'] == session] = df_labels.correct[(df_labels['session_id'] == session) & (df_labels['question'] == i)]

这段代码能够运行,但效率非常低下,对于这样的操作需要 20 分钟,并且我可能需要进行更多类似的操作,这将导致代码效率和计算时间非常差。非常感谢您的帮助!

英文:

I have this loop where i take questions from a DataFrame df_labels that looks like that for session 21100511290882536:

"session_id"      "question" "correct"
21100511290882536	1	1
21100511290882536	2	1
21100511290882536	3	1
21100511290882536	4	1
21100511290882536	5	0
21100511290882536	6	1
21100511290882536	7	1
21100511290882536	8	0
21100511290882536	9	1
21100511290882536	10	1
21100511290882536	11	1
21100511290882536	12	1
21100511290882536	13	0
21100511290882536	14	1
21100511290882536	15	1
21100511290882536	16	1
21100511290882536	17	1
21100511290882536	18	1

and i would like to convert it and all other sessions to a dataframe like this:

"session_id"	   "q_1"   "q_2"  "q_3"   "q_4"	...
21100511290882536    1        1     1       1   ...

knowing i already have a DataFrame "df_sessions" listing all the sessions

for session in df_sessions.session_id:
    for i in range(1,19):
        df_sessions[f'q_{i}'][df_sessions['session_id'] == session] = df_labels.correct[(df_labels['session_id'] == session) & (df_labels['question'] == i)]
    

This code works but is very ineficient, it takes 20 minutes for such an operation and i might need to do more operation like this thus leading to very poor code efficiency and computing time.
Thx in advance for your help !

答案1

得分: 2

尝试:

(df.assign(question='q_' + df['question'].astype(str).str.zfill(2))
   .pivot(index='session_id', columns='question', values='correct')
   .rename_axis(columns=None).reset_index())
       
          session_id  q_01  q_02  q_03  q_04  q_05  q_06  q_07  q_08  q_09  q_10  q_11  q_12  q_13  q_14  q_15  q_16  q_17  q_18
0  21100511290882536     1     1     1     1     0     1     1     0     1     1     1     1     0     1     1     1     1     1
英文:

Try:

>>> (df.assign(question='q_' + df['question'].astype(str).str.zfill(2))
       .pivot(index='session_id', columns='question', values='correct')
       .rename_axis(columns=None).reset_index())
       
          session_id  q_01  q_02  q_03  q_04  q_05  q_06  q_07  q_08  q_09  q_10  q_11  q_12  q_13  q_14  q_15  q_16  q_17  q_18
0  21100511290882536     1     1     1     1     0     1     1     0     1     1     1     1     0     1     1     1     1     1

huangapple
  • 本文由 发表于 2023年2月16日 17:06:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75469938.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定