基于另一个数据框架的条件筛选多级索引数据框。

huangapple go评论103阅读模式
英文:

filter a multiindex dataframe bassed on condition from another dataframe

问题

你想要将代码部分翻译成中文吗?

英文:

I'd appreciate your help.

I have a multiindex dataframe like this:

  1. df1 = {'Sex': {('002_S_0413', 0, 'DTI'): 'F',
  2. ('002_S_0413', 0, 'T1'): 'F',
  3. ('002_S_4213', 2, 'DTI'): 'F',
  4. ('002_S_4213', 2, 'T1'): 'F',
  5. ('002_S_4799', 0, 'DTI'): 'M',
  6. ('002_S_4799', 0, 'T1'): 'M',
  7. ('002_S_5178', 0, 'DTI'): 'M',
  8. ('002_S_5178', 0, 'T1'): 'M',
  9. ('002_S_5230', 2, 'DTI'): 'F',
  10. ('002_S_5230', 2, 'T1'): 'F'},
  11. 'DIAGNOSIS': {('002_S_0413', 0, 'DTI'): 1.0,
  12. ('002_S_0413', 0, 'T1'): 1.0,
  13. ('002_S_4213', 2, 'DTI'): 1.0,
  14. ('002_S_4213', 2, 'T1'): 1.0,
  15. ('002_S_4799', 0, 'DTI'): 1.0,
  16. ('002_S_4799', 0, 'T1'): 1.0,
  17. ('002_S_5178', 0, 'DTI'): 1.0,
  18. ('002_S_5178', 0, 'T1'): 1.0,
  19. ('002_S_5230', 2, 'DTI'): 1.0,
  20. ('002_S_5230', 2, 'T1'): 1.0}}

and a second dataframe:

  1. df2 = {'Subject ID': {0: '002_S_0413',
  2. 1: '002_S_0413',
  3. 2: '002_S_4213',
  4. 3: '002_S_4213',
  5. 4: '002_S_4799',
  6. 5: '002_S_4799',
  7. 6: '002_S_4799',
  8. 7: '002_S_5178',
  9. 8: '002_S_5178',
  10. 9: '002_S_5230',
  11. 10: '002_S_5230',
  12. 11: '002_S_5230',
  13. 12: '002_S_6007',
  14. 13: '002_S_6007'},
  15. 'Visit_NUM': {0: 0,
  16. 1: 2,
  17. 2: 0,
  18. 3: 2,
  19. 4: 0,
  20. 5: 1,
  21. 6: 2,
  22. 7: 0,
  23. 8: 2,
  24. 9: 0,
  25. 10: 1,
  26. 11: 2,
  27. 12: 0,
  28. 13: 1}}

I want to filter df1:
if for each correspondant Subject ID (level=0) in both dataframes there is a Visit_NUM in df2 that is greater than or equal to 2 plus that in df1 (level=1), keep it (the subject's row in df1), if not delete it.

To clarify: for each Subject ID, if (Visit_NUM_in_df1) >= (2 + Visit_NUM_in_df2) keep that Subject's row in df1, if not delete it.

This is what I've done:

  1. df3 = pd.DataFrame(df1.reset_index([
  2. 'Visit_NUM', 'Description']).groupby(
  3. level=0)['Visit_NUM'].transform(lambda x: x + 2)).reset_index(
  4. ).drop_duplicates(['Subject ID'])
  5. t = df3.merge(df2.reset_index(), on=['Subject ID', 'Visit_NUM'])
  6. t = t['Subject ID']
  7. out = df1.loc[df1.index.get_level_values('Subject ID').isin(t)]

The result would be something like:

  1. out = {'Sex': {('002_S_0413', 0, 'DTI'): 'F',
  2. ('002_S_0413', 0, 'T1'): 'F',
  3. ('002_S_4799', 0, 'DTI'): 'M',
  4. ('002_S_4799', 0, 'T1'): 'M',
  5. ('002_S_5178', 0, 'DTI'): 'M',
  6. ('002_S_5178', 0, 'T1'): 'M'},
  7. 'DIAGNOSIS': {('002_S_0413', 0, 'DTI'): 1.0,
  8. ('002_S_0413', 0, 'T1'): 1.0,
  9. ('002_S_4799', 0, 'DTI'): 1.0,
  10. ('002_S_4799', 0, 'T1'): 1.0,
  11. ('002_S_5178', 0, 'DTI'): 1.0,
  12. ('002_S_5178', 0, 'T1'): 1.0}}

By doing what I've done I'm just getting those that have Visit_NUM + 2, but not those that are >= (Visit_NUM + 2).

Also, I believe there is an easy way to do this, thanks!

答案1

得分: 0

尝试使用以下代码代替:

  1. max_visit_num = df2.groupby('Subject ID')['Visit_NUM'].max()
  2. out = df1.loc[df1.index.get_level_values(0).isin(max_visit_num[max_visit_num >= 2].index)]
英文:

Try this instead:

  1. max_visit_num = df2.groupby('Subject ID')['Visit_NUM'].max()
  2. out = df1.loc[df1.index.get_level_values(0).isin(max_visit_num[max_visit_num >= 2].index)]

答案2

得分: 0

以下是翻译好的代码部分:

  1. list_subjects = np.unique(df1.reset_index()['Subject ID'])
  2. tt=[]
  3. for i in range(len(list_subjects)):
  4. name = list_subjects[i]
  5. visit = df2[df2['Subject ID'] == name]['Visit_NUM']
  6. visit_min = np.unique(df1.loc[name].reset_index()['Visit_NUM'])
  7. if any(visit >= visit_min[0]+2):
  8. tt.append(name)
  9. out = df1.loc[df1.index.get_level_values('Subject ID').isin(tt)]

请注意,我只提供了代码的翻译,没有包括任何其他内容。

英文:

A friend has helped me with this answer:

  1. list_subjects = np.unique(df1.reset_index()['Subject ID'])
  2. tt=[]
  3. for i in range(len(list_subjects)):
  4. name = list_subjects[i]
  5. visit = df2[df2['Subject ID'] == name]['Visit_NUM']
  6. visit_min = np.unique(df1.loc[name].reset_index()['Visit_NUM'])
  7. if any(visit >= visit_min[0]+2):
  8. tt.append(name)
  9. out = df1.loc[df1.index.get_level_values('Subject ID').isin(tt)]

huangapple
  • 本文由 发表于 2023年6月15日 04:02:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477167.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定