DataFrame 操作在循环中非常低效,不知道如何修复它。

huangapple go评论85阅读模式
英文:

DataFrame Operation very ineficient with loop, don't know how to fix it

问题

我有这样一个循环,从一个名为 df_labels 的 DataFrame 中获取问题,看起来像是对于会话 21100511290882536:

  1. "session_id" "question" "correct"
  2. 21100511290882536 1 1
  3. 21100511290882536 2 1
  4. 21100511290882536 3 1
  5. 21100511290882536 4 1
  6. 21100511290882536 5 0
  7. 21100511290882536 6 1
  8. 21100511290882536 7 1
  9. 21100511290882536 8 0
  10. 21100511290882536 9 1
  11. 21100511290882536 10 1
  12. 21100511290882536 11 1
  13. 21100511290882536 12 1
  14. 21100511290882536 13 0
  15. 21100511290882536 14 1
  16. 21100511290882536 15 1
  17. 21100511290882536 16 1
  18. 21100511290882536 17 1
  19. 21100511290882536 18 1

而我想要将其和所有其他会话转换成一个如下的 DataFrame:

  1. "session_id" "q_1" "q_2" "q_3" "q_4" ...
  2. 21100511290882536 1 1 1 1 ...

我已经有一个名为 "df_sessions" 的 DataFrame,列出了所有的会话。

  1. for session in df_sessions.session_id:
  2. for i in range(1, 19):
  3. df_sessions[f'q_{i}'][df_sessions['session_id'] == session] = df_labels.correct[(df_labels['session_id'] == session) & (df_labels['question'] == i)]

这段代码能够运行,但效率非常低下,对于这样的操作需要 20 分钟,并且我可能需要进行更多类似的操作,这将导致代码效率和计算时间非常差。非常感谢您的帮助!

英文:

I have this loop where i take questions from a DataFrame df_labels that looks like that for session 21100511290882536:

  1. "session_id" "question" "correct"
  2. 21100511290882536 1 1
  3. 21100511290882536 2 1
  4. 21100511290882536 3 1
  5. 21100511290882536 4 1
  6. 21100511290882536 5 0
  7. 21100511290882536 6 1
  8. 21100511290882536 7 1
  9. 21100511290882536 8 0
  10. 21100511290882536 9 1
  11. 21100511290882536 10 1
  12. 21100511290882536 11 1
  13. 21100511290882536 12 1
  14. 21100511290882536 13 0
  15. 21100511290882536 14 1
  16. 21100511290882536 15 1
  17. 21100511290882536 16 1
  18. 21100511290882536 17 1
  19. 21100511290882536 18 1

and i would like to convert it and all other sessions to a dataframe like this:

  1. "session_id" "q_1" "q_2" "q_3" "q_4" ...
  2. 21100511290882536 1 1 1 1 ...

knowing i already have a DataFrame "df_sessions" listing all the sessions

  1. for session in df_sessions.session_id:
  2. for i in range(1,19):
  3. df_sessions[f'q_{i}'][df_sessions['session_id'] == session] = df_labels.correct[(df_labels['session_id'] == session) & (df_labels['question'] == i)]

This code works but is very ineficient, it takes 20 minutes for such an operation and i might need to do more operation like this thus leading to very poor code efficiency and computing time.
Thx in advance for your help !

答案1

得分: 2

尝试:

  1. (df.assign(question='q_' + df['question'].astype(str).str.zfill(2))
  2. .pivot(index='session_id', columns='question', values='correct')
  3. .rename_axis(columns=None).reset_index())
  4. session_id q_01 q_02 q_03 q_04 q_05 q_06 q_07 q_08 q_09 q_10 q_11 q_12 q_13 q_14 q_15 q_16 q_17 q_18
  5. 0 21100511290882536 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1
英文:

Try:

  1. >>> (df.assign(question='q_' + df['question'].astype(str).str.zfill(2))
  2. .pivot(index='session_id', columns='question', values='correct')
  3. .rename_axis(columns=None).reset_index())
  4. session_id q_01 q_02 q_03 q_04 q_05 q_06 q_07 q_08 q_09 q_10 q_11 q_12 q_13 q_14 q_15 q_16 q_17 q_18
  5. 0 21100511290882536 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1

huangapple
  • 本文由 发表于 2023年2月16日 17:06:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75469938.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定