英文:
DataFrame Operation very ineficient with loop, don't know how to fix it
问题
我有这样一个循环,从一个名为 df_labels 的 DataFrame 中获取问题,看起来像是对于会话 21100511290882536:
"session_id" "question" "correct"
21100511290882536 1 1
21100511290882536 2 1
21100511290882536 3 1
21100511290882536 4 1
21100511290882536 5 0
21100511290882536 6 1
21100511290882536 7 1
21100511290882536 8 0
21100511290882536 9 1
21100511290882536 10 1
21100511290882536 11 1
21100511290882536 12 1
21100511290882536 13 0
21100511290882536 14 1
21100511290882536 15 1
21100511290882536 16 1
21100511290882536 17 1
21100511290882536 18 1
而我想要将其和所有其他会话转换成一个如下的 DataFrame:
"session_id" "q_1" "q_2" "q_3" "q_4" ...
21100511290882536 1 1 1 1 ...
我已经有一个名为 "df_sessions" 的 DataFrame,列出了所有的会话。
for session in df_sessions.session_id:
for i in range(1, 19):
df_sessions[f'q_{i}'][df_sessions['session_id'] == session] = df_labels.correct[(df_labels['session_id'] == session) & (df_labels['question'] == i)]
这段代码能够运行,但效率非常低下,对于这样的操作需要 20 分钟,并且我可能需要进行更多类似的操作,这将导致代码效率和计算时间非常差。非常感谢您的帮助!
英文:
I have this loop where i take questions from a DataFrame df_labels that looks like that for session 21100511290882536:
"session_id" "question" "correct"
21100511290882536 1 1
21100511290882536 2 1
21100511290882536 3 1
21100511290882536 4 1
21100511290882536 5 0
21100511290882536 6 1
21100511290882536 7 1
21100511290882536 8 0
21100511290882536 9 1
21100511290882536 10 1
21100511290882536 11 1
21100511290882536 12 1
21100511290882536 13 0
21100511290882536 14 1
21100511290882536 15 1
21100511290882536 16 1
21100511290882536 17 1
21100511290882536 18 1
and i would like to convert it and all other sessions to a dataframe like this:
"session_id" "q_1" "q_2" "q_3" "q_4" ...
21100511290882536 1 1 1 1 ...
knowing i already have a DataFrame "df_sessions" listing all the sessions
for session in df_sessions.session_id:
for i in range(1,19):
df_sessions[f'q_{i}'][df_sessions['session_id'] == session] = df_labels.correct[(df_labels['session_id'] == session) & (df_labels['question'] == i)]
This code works but is very ineficient, it takes 20 minutes for such an operation and i might need to do more operation like this thus leading to very poor code efficiency and computing time.
Thx in advance for your help !
答案1
得分: 2
尝试:
(df.assign(question='q_' + df['question'].astype(str).str.zfill(2))
.pivot(index='session_id', columns='question', values='correct')
.rename_axis(columns=None).reset_index())
session_id q_01 q_02 q_03 q_04 q_05 q_06 q_07 q_08 q_09 q_10 q_11 q_12 q_13 q_14 q_15 q_16 q_17 q_18
0 21100511290882536 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1
英文:
Try:
>>> (df.assign(question='q_' + df['question'].astype(str).str.zfill(2))
.pivot(index='session_id', columns='question', values='correct')
.rename_axis(columns=None).reset_index())
session_id q_01 q_02 q_03 q_04 q_05 q_06 q_07 q_08 q_09 q_10 q_11 q_12 q_13 q_14 q_15 q_16 q_17 q_18
0 21100511290882536 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论