基于另一个数据框架的条件筛选多级索引数据框。

huangapple go评论54阅读模式
英文:

filter a multiindex dataframe bassed on condition from another dataframe

问题

你想要将代码部分翻译成中文吗?

英文:

I'd appreciate your help.

I have a multiindex dataframe like this:

df1 = {'Sex': {('002_S_0413', 0, 'DTI'): 'F',
  ('002_S_0413', 0, 'T1'): 'F',
  ('002_S_4213', 2, 'DTI'): 'F',
  ('002_S_4213', 2, 'T1'): 'F',
  ('002_S_4799', 0, 'DTI'): 'M',
  ('002_S_4799', 0, 'T1'): 'M',
  ('002_S_5178', 0, 'DTI'): 'M',
  ('002_S_5178', 0, 'T1'): 'M',
  ('002_S_5230', 2, 'DTI'): 'F',
  ('002_S_5230', 2, 'T1'): 'F'},
 'DIAGNOSIS': {('002_S_0413', 0, 'DTI'): 1.0,
  ('002_S_0413', 0, 'T1'): 1.0,
  ('002_S_4213', 2, 'DTI'): 1.0,
  ('002_S_4213', 2, 'T1'): 1.0,
  ('002_S_4799', 0, 'DTI'): 1.0,
  ('002_S_4799', 0, 'T1'): 1.0,
  ('002_S_5178', 0, 'DTI'): 1.0,
  ('002_S_5178', 0, 'T1'): 1.0,
  ('002_S_5230', 2, 'DTI'): 1.0,
  ('002_S_5230', 2, 'T1'): 1.0}}

and a second dataframe:

df2 = {'Subject ID': {0: '002_S_0413',
  1: '002_S_0413',
  2: '002_S_4213',
  3: '002_S_4213',
  4: '002_S_4799',
  5: '002_S_4799',
  6: '002_S_4799',
  7: '002_S_5178',
  8: '002_S_5178',
  9: '002_S_5230',
  10: '002_S_5230',
  11: '002_S_5230',
  12: '002_S_6007',
  13: '002_S_6007'},
 'Visit_NUM': {0: 0,
  1: 2,
  2: 0,
  3: 2,
  4: 0,
  5: 1,
  6: 2,
  7: 0,
  8: 2,
  9: 0,
  10: 1,
  11: 2,
  12: 0,
  13: 1}}

I want to filter df1:
if for each correspondant Subject ID (level=0) in both dataframes there is a Visit_NUM in df2 that is greater than or equal to 2 plus that in df1 (level=1), keep it (the subject's row in df1), if not delete it.

To clarify: for each Subject ID, if (Visit_NUM_in_df1) >= (2 + Visit_NUM_in_df2) keep that Subject's row in df1, if not delete it.

This is what I've done:

df3 = pd.DataFrame(df1.reset_index([
    'Visit_NUM', 'Description']).groupby(
    level=0)['Visit_NUM'].transform(lambda x: x + 2)).reset_index(
).drop_duplicates(['Subject ID'])

t = df3.merge(df2.reset_index(), on=['Subject ID', 'Visit_NUM'])
t = t['Subject ID']
out = df1.loc[df1.index.get_level_values('Subject ID').isin(t)]

The result would be something like:

out = {'Sex': {('002_S_0413', 0, 'DTI'): 'F',
  ('002_S_0413', 0, 'T1'): 'F',
  ('002_S_4799', 0, 'DTI'): 'M',
  ('002_S_4799', 0, 'T1'): 'M',
  ('002_S_5178', 0, 'DTI'): 'M',
  ('002_S_5178', 0, 'T1'): 'M'},
 'DIAGNOSIS': {('002_S_0413', 0, 'DTI'): 1.0,
  ('002_S_0413', 0, 'T1'): 1.0,
  ('002_S_4799', 0, 'DTI'): 1.0,
  ('002_S_4799', 0, 'T1'): 1.0,
  ('002_S_5178', 0, 'DTI'): 1.0,
  ('002_S_5178', 0, 'T1'): 1.0}}

By doing what I've done I'm just getting those that have Visit_NUM + 2, but not those that are >= (Visit_NUM + 2).

Also, I believe there is an easy way to do this, thanks!

答案1

得分: 0

尝试使用以下代码代替:

max_visit_num = df2.groupby('Subject ID')['Visit_NUM'].max()
out = df1.loc[df1.index.get_level_values(0).isin(max_visit_num[max_visit_num >= 2].index)]
英文:

Try this instead:

max_visit_num = df2.groupby('Subject ID')['Visit_NUM'].max()
out = df1.loc[df1.index.get_level_values(0).isin(max_visit_num[max_visit_num >= 2].index)]

答案2

得分: 0

以下是翻译好的代码部分:

list_subjects = np.unique(df1.reset_index()['Subject ID'])

tt=[]
for i in range(len(list_subjects)):
    name = list_subjects[i]
    visit = df2[df2['Subject ID'] == name]['Visit_NUM']
    visit_min = np.unique(df1.loc[name].reset_index()['Visit_NUM'])
    if any(visit >= visit_min[0]+2):
        tt.append(name)
        
out = df1.loc[df1.index.get_level_values('Subject ID').isin(tt)]

请注意,我只提供了代码的翻译,没有包括任何其他内容。

英文:

A friend has helped me with this answer:

list_subjects = np.unique(df1.reset_index()['Subject ID'])

tt=[]
for i in range(len(list_subjects)):
    name = list_subjects[i]
    visit = df2[df2['Subject ID'] == name]['Visit_NUM']
    visit_min = np.unique(df1.loc[name].reset_index()['Visit_NUM'])
    if any(visit >= visit_min[0]+2):
        tt.append(name)
        
out = df1.loc[df1.index.get_level_values('Subject ID').isin(tt)]

huangapple
  • 本文由 发表于 2023年6月15日 04:02:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477167.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定