英文:
filter a multiindex dataframe bassed on condition from another dataframe
问题
你想要将代码部分翻译成中文吗?
英文:
I'd appreciate your help.
I have a multiindex dataframe like this:
df1 = {'Sex': {('002_S_0413', 0, 'DTI'): 'F',
('002_S_0413', 0, 'T1'): 'F',
('002_S_4213', 2, 'DTI'): 'F',
('002_S_4213', 2, 'T1'): 'F',
('002_S_4799', 0, 'DTI'): 'M',
('002_S_4799', 0, 'T1'): 'M',
('002_S_5178', 0, 'DTI'): 'M',
('002_S_5178', 0, 'T1'): 'M',
('002_S_5230', 2, 'DTI'): 'F',
('002_S_5230', 2, 'T1'): 'F'},
'DIAGNOSIS': {('002_S_0413', 0, 'DTI'): 1.0,
('002_S_0413', 0, 'T1'): 1.0,
('002_S_4213', 2, 'DTI'): 1.0,
('002_S_4213', 2, 'T1'): 1.0,
('002_S_4799', 0, 'DTI'): 1.0,
('002_S_4799', 0, 'T1'): 1.0,
('002_S_5178', 0, 'DTI'): 1.0,
('002_S_5178', 0, 'T1'): 1.0,
('002_S_5230', 2, 'DTI'): 1.0,
('002_S_5230', 2, 'T1'): 1.0}}
and a second dataframe:
df2 = {'Subject ID': {0: '002_S_0413',
1: '002_S_0413',
2: '002_S_4213',
3: '002_S_4213',
4: '002_S_4799',
5: '002_S_4799',
6: '002_S_4799',
7: '002_S_5178',
8: '002_S_5178',
9: '002_S_5230',
10: '002_S_5230',
11: '002_S_5230',
12: '002_S_6007',
13: '002_S_6007'},
'Visit_NUM': {0: 0,
1: 2,
2: 0,
3: 2,
4: 0,
5: 1,
6: 2,
7: 0,
8: 2,
9: 0,
10: 1,
11: 2,
12: 0,
13: 1}}
I want to filter df1
:
if for each correspondant Subject ID
(level=0) in both dataframes there is a Visit_NUM
in df2
that is greater than or equal to 2 plus that in df1
(level=1), keep it (the subject's row in df1), if not delete it.
To clarify: for each Subject ID
, if (Visit_NUM_in_df1) >= (2 + Visit_NUM_in_df2) keep that Subject's row in df1, if not delete it.
This is what I've done:
df3 = pd.DataFrame(df1.reset_index([
'Visit_NUM', 'Description']).groupby(
level=0)['Visit_NUM'].transform(lambda x: x + 2)).reset_index(
).drop_duplicates(['Subject ID'])
t = df3.merge(df2.reset_index(), on=['Subject ID', 'Visit_NUM'])
t = t['Subject ID']
out = df1.loc[df1.index.get_level_values('Subject ID').isin(t)]
The result would be something like:
out = {'Sex': {('002_S_0413', 0, 'DTI'): 'F',
('002_S_0413', 0, 'T1'): 'F',
('002_S_4799', 0, 'DTI'): 'M',
('002_S_4799', 0, 'T1'): 'M',
('002_S_5178', 0, 'DTI'): 'M',
('002_S_5178', 0, 'T1'): 'M'},
'DIAGNOSIS': {('002_S_0413', 0, 'DTI'): 1.0,
('002_S_0413', 0, 'T1'): 1.0,
('002_S_4799', 0, 'DTI'): 1.0,
('002_S_4799', 0, 'T1'): 1.0,
('002_S_5178', 0, 'DTI'): 1.0,
('002_S_5178', 0, 'T1'): 1.0}}
By doing what I've done I'm just getting those that have Visit_NUM + 2, but not those that are >= (Visit_NUM + 2).
Also, I believe there is an easy way to do this, thanks!
答案1
得分: 0
尝试使用以下代码代替:
max_visit_num = df2.groupby('Subject ID')['Visit_NUM'].max()
out = df1.loc[df1.index.get_level_values(0).isin(max_visit_num[max_visit_num >= 2].index)]
英文:
Try this instead:
max_visit_num = df2.groupby('Subject ID')['Visit_NUM'].max()
out = df1.loc[df1.index.get_level_values(0).isin(max_visit_num[max_visit_num >= 2].index)]
答案2
得分: 0
以下是翻译好的代码部分:
list_subjects = np.unique(df1.reset_index()['Subject ID'])
tt=[]
for i in range(len(list_subjects)):
name = list_subjects[i]
visit = df2[df2['Subject ID'] == name]['Visit_NUM']
visit_min = np.unique(df1.loc[name].reset_index()['Visit_NUM'])
if any(visit >= visit_min[0]+2):
tt.append(name)
out = df1.loc[df1.index.get_level_values('Subject ID').isin(tt)]
请注意,我只提供了代码的翻译,没有包括任何其他内容。
英文:
A friend has helped me with this answer:
list_subjects = np.unique(df1.reset_index()['Subject ID'])
tt=[]
for i in range(len(list_subjects)):
name = list_subjects[i]
visit = df2[df2['Subject ID'] == name]['Visit_NUM']
visit_min = np.unique(df1.loc[name].reset_index()['Visit_NUM'])
if any(visit >= visit_min[0]+2):
tt.append(name)
out = df1.loc[df1.index.get_level_values('Subject ID').isin(tt)]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论