2023年6月5日 06:52:35go评论95阅读模式

英文:

Extracting Specific Groups from a DataFrame under Specific Conditions

问题

I want to extract groups that contain a row with (Name: J and Age: 33). Here's the translated code:

我想提取包含(Name: J 和 Age: 33)的行的组。以下是翻译好的代码：
df = pd.DataFrame({
    'X': [1,1,1,1,2,2,2,2],
    'Y': [3,3,4,4,3,3,4,4],
    'Name': ['J', 'A', 'B', 'X', 'V', 'P', 'J', 'V'],
    'Age': [33,47,53,22,33,80,33,93]
})

I hope this helps!

英文:

I have a DataFrame as below where I want to extract groups that contain a row with (Name: J and Age: 33)

X	Y	Name	Age
1	3	J	33
1	3	A	47
1	4	B	53
1	4	X	22
2	3	J	33
2	3	P	80
2	4	V	90
2	4	V	93

Overall it would produce the table below, because the 1, 3 X/Y group contains J, 33 and the 2,3 X/Y group also contains a J, 33 row.

X	Y	Name	Age
1	3	J	33
1	3	A	47
2	3	J	33
2	3	P	80

I've been approaching this by iterating over the rows which has been far too slow and was wondering if there was a much faster way by using the groupby and apply/pipe methods in Pandas. Any help is appreciated

Example DF below:

df = pd.DataFrame({
    &#39;X&#39;: [1,1,1,1,2,2,2,2],
    &#39;Y&#39;: [3,3,4,4,3,3,4,4],
    &#39;Name&#39;: [&#39;J&#39;, &#39;A&#39;, &#39;B&#39;, &#39;X&#39;, &#39;V&#39;, &#39;P&#39;, &#39;J&#39;, &#39;V&#39;],
    &#39;Age&#39;: [33,47,53,22,33,80,33,93]
})

答案1

得分: 1

以下是翻译好的内容：

使用groupby的一个选项：

# 获取等于（'J'，33）的行
check = df.loc(axis=1)[['Name','Age']].eq(('J', 33)).all(axis=1)
# 运行groupby并获取在该组中的任何行存在True的组
check = check.groupby([df.X, df.Y]).transform('any')
# 过滤原始数据框
df.loc[check]
   X  Y Name  Age
0  1  3    J   33
1  1  3    A   47
4  2  3    J   33
5  2  3    P   80

另一个选项，仍然使用groupby：

group = df.groupby(['X','Y'])
cond1 = group.Name.transform(lambda x: any(x == 'J'))
cond2 = group.Age.transform(lambda x: any(x == 33))
df.loc[cond1 & cond2]
   X  Y Name  Age
0  1  3    J   33
1  1  3    A   47
4  2  3    J   33
5  2  3    P   80

英文:

One option is with a groupby:

# Get rows equal to (&#39;J&#39;, 33)
check = df.loc(axis=1)[[&#39;Name&#39;,&#39;Age&#39;]].eq((&#39;J&#39;, 33)).all(axis=1)
# run a groupby and get groups where True exists for any row in that group
check = check.groupby([df.X, df.Y]).transform(&#39;any&#39;)
#filter original dataframe
df.loc[check]
   X  Y Name  Age
0  1  3    J   33
1  1  3    A   47
4  2  3    J   33
5  2  3    P   80

Another option, still with a groupby:

group = df.groupby([&#39;X&#39;,&#39;Y&#39;])
cond1 = group.Name.transform(lambda x: any(x == &#39;J&#39;))
cond2 = group.Age.transform(lambda x: any(x == 33))
df.loc[cond1 &amp; cond2]
   X  Y Name  Age
0  1  3    J   33
1  1  3    A   47
4  2  3    J   33
5  2  3    P   80

答案2

得分: 0

以下是您要翻译的内容：

mask = (df['Name'] == 'J') & (df['Age'] == 33)
unique_x = df.loc[mask, 'X'].unique()
unique_y = df.loc[mask, 'Y'].unique()
print(df[df['X'].isin(unique_x) & df['Y'].isin(unique_y)])

打印结果：

   X  Y Name  Age
0  1  3    J   33
1  1  3    A   47
4  2  3    J   33
5  2  3    P   80

编辑：根据更新的问题，以下是没有使用 .groupby 的解决方案：

mask = (df['Name'] == 'J') & (df['Age'] == 33)
t = set(df.loc[mask, ['X', 'Y']].drop_duplicates().apply(tuple, 1))
out = df[df.loc[:, ['X', 'Y']].apply(lambda x: tuple(x) in t, axis=1)]
print(out)

打印结果：

   X  Y Name  Age
0  1  3    J   33
1  1  3    A   47
4  2  3    J   33
5  2  3    P   80

英文:

You can search for all unique X and Y values that contain Name == J and Age == 33 and filter the dataframe afterwards:

mask = (df[&#39;Name&#39;] == &#39;J&#39;) &amp; (df[&#39;Age&#39;] == 33)
unique_x = df.loc[mask, &#39;X&#39;].unique()
unique_y = df.loc[mask, &#39;Y&#39;].unique()
print(df[df[&#39;X&#39;].isin(unique_x) &amp; df[&#39;Y&#39;].isin(unique_y)])

Prints:

   X  Y Name  Age
0  1  3    J   33
1  1  3    A   47
4  2  3    J   33
5  2  3    P   80

EDIT: With updated question, a solution without .groupby:

mask = (df[&#39;Name&#39;] == &#39;J&#39;) &amp; (df[&#39;Age&#39;] == 33)
t = set(df.loc[mask, [&#39;X&#39;, &#39;Y&#39;]].drop_duplicates().apply(tuple, 1))
out = df[df.loc[:, [&#39;X&#39;, &#39;Y&#39;]].apply(lambda x: tuple(x) in t, axis=1)]
print(out)

Prints:

   X  Y Name  Age
0  1  3    J   33
1  1  3    A   47
4  2  3    J   33
5  2  3    P   80

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从DataFrame中根据特定条件提取特定组。

问题

答案1

答案2

从Pandas时间戳中获取日期的更清晰方法

如何在转置后保持两列的值对齐？

如何将一个函数映射到命名元组的所有元素？

Merge or append 2 dataframes row wise and add a check in a separate column determining which one it came from

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。