英文:
Extracting Specific Groups from a DataFrame under Specific Conditions
问题
I want to extract groups that contain a row with (Name: J and Age: 33). Here's the translated code:
我想提取包含(Name: J 和 Age: 33)的行的组。以下是翻译好的代码:
df = pd.DataFrame({
'X': [1,1,1,1,2,2,2,2],
'Y': [3,3,4,4,3,3,4,4],
'Name': ['J', 'A', 'B', 'X', 'V', 'P', 'J', 'V'],
'Age': [33,47,53,22,33,80,33,93]
})
I hope this helps!
英文:
I have a DataFrame as below where I want to extract groups that contain a row with (Name: J and Age: 33)
X | Y | Name | Age |
---|---|---|---|
1 | 3 | J | 33 |
1 | 3 | A | 47 |
1 | 4 | B | 53 |
1 | 4 | X | 22 |
2 | 3 | J | 33 |
2 | 3 | P | 80 |
2 | 4 | V | 90 |
2 | 4 | V | 93 |
Overall it would produce the table below, because the 1, 3 X/Y group contains J, 33 and the 2,3 X/Y group also contains a J, 33 row.
X | Y | Name | Age |
---|---|---|---|
1 | 3 | J | 33 |
1 | 3 | A | 47 |
2 | 3 | J | 33 |
2 | 3 | P | 80 |
I've been approaching this by iterating over the rows which has been far too slow and was wondering if there was a much faster way by using the groupby and apply/pipe methods in Pandas. Any help is appreciated
Example DF below:
df = pd.DataFrame({
'X': [1,1,1,1,2,2,2,2],
'Y': [3,3,4,4,3,3,4,4],
'Name': ['J', 'A', 'B', 'X', 'V', 'P', 'J', 'V'],
'Age': [33,47,53,22,33,80,33,93]
})
答案1
得分: 1
以下是翻译好的内容:
使用groupby的一个选项:
# 获取等于('J',33)的行
check = df.loc(axis=1)[['Name','Age']].eq(('J', 33)).all(axis=1)
# 运行groupby并获取在该组中的任何行存在True的组
check = check.groupby([df.X, df.Y]).transform('any')
# 过滤原始数据框
df.loc[check]
X Y Name Age
0 1 3 J 33
1 1 3 A 47
4 2 3 J 33
5 2 3 P 80
另一个选项,仍然使用groupby:
group = df.groupby(['X','Y'])
cond1 = group.Name.transform(lambda x: any(x == 'J'))
cond2 = group.Age.transform(lambda x: any(x == 33))
df.loc[cond1 & cond2]
X Y Name Age
0 1 3 J 33
1 1 3 A 47
4 2 3 J 33
5 2 3 P 80
英文:
One option is with a groupby:
# Get rows equal to ('J', 33)
check = df.loc(axis=1)[['Name','Age']].eq(('J', 33)).all(axis=1)
# run a groupby and get groups where True exists for any row in that group
check = check.groupby([df.X, df.Y]).transform('any')
#filter original dataframe
df.loc[check]
X Y Name Age
0 1 3 J 33
1 1 3 A 47
4 2 3 J 33
5 2 3 P 80
Another option, still with a groupby:
group = df.groupby(['X','Y'])
cond1 = group.Name.transform(lambda x: any(x == 'J'))
cond2 = group.Age.transform(lambda x: any(x == 33))
df.loc[cond1 & cond2]
X Y Name Age
0 1 3 J 33
1 1 3 A 47
4 2 3 J 33
5 2 3 P 80
答案2
得分: 0
以下是您要翻译的内容:
mask = (df['Name'] == 'J') & (df['Age'] == 33)
unique_x = df.loc[mask, 'X'].unique()
unique_y = df.loc[mask, 'Y'].unique()
print(df[df['X'].isin(unique_x) & df['Y'].isin(unique_y)])
打印结果:
X Y Name Age
0 1 3 J 33
1 1 3 A 47
4 2 3 J 33
5 2 3 P 80
编辑:根据更新的问题,以下是没有使用 .groupby
的解决方案:
mask = (df['Name'] == 'J') & (df['Age'] == 33)
t = set(df.loc[mask, ['X', 'Y']].drop_duplicates().apply(tuple, 1))
out = df[df.loc[:, ['X', 'Y']].apply(lambda x: tuple(x) in t, axis=1)]
print(out)
打印结果:
X Y Name Age
0 1 3 J 33
1 1 3 A 47
4 2 3 J 33
5 2 3 P 80
英文:
You can search for all unique X
and Y
values that contain Name == J
and Age == 33
and filter the dataframe afterwards:
mask = (df['Name'] == 'J') & (df['Age'] == 33)
unique_x = df.loc[mask, 'X'].unique()
unique_y = df.loc[mask, 'Y'].unique()
print(df[df['X'].isin(unique_x) & df['Y'].isin(unique_y)])
Prints:
X Y Name Age
0 1 3 J 33
1 1 3 A 47
4 2 3 J 33
5 2 3 P 80
EDIT: With updated question, a solution without .groupby
:
mask = (df['Name'] == 'J') & (df['Age'] == 33)
t = set(df.loc[mask, ['X', 'Y']].drop_duplicates().apply(tuple, 1))
out = df[df.loc[:, ['X', 'Y']].apply(lambda x: tuple(x) in t, axis=1)]
print(out)
Prints:
X Y Name Age
0 1 3 J 33
1 1 3 A 47
4 2 3 J 33
5 2 3 P 80
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论