2020年1月6日 21:10:40go评论113阅读模式

英文:

Create new column based on group-by function

问题

你可以尝试以下代码来实现你的需求：

import pandas as pd
df1 = pd.DataFrame({'Name': ['Bob', 'Bob', 'Bob', 'Joe', 'Joe', 'Joe', 'Alan', 'Alan', 'Steve', 'Steve'],
                    'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                    'Value': ['Y', 'Y', 'Y', 'N', 'N', 'N', 'Y', 'N', 'N', 'Y']})
# 定义一个函数来计算Result列的值
def compute_result(group):
    if group['Value'].all() == 'Y':
        group['Result'] = 'Y'
    else:
        group['Result'] = 'N'
    return group
# 使用groupby和apply来应用函数
df1 = df1.groupby('Name').apply(compute_result).reset_index(drop=True)
# 打印结果
print(df1)

这个代码会根据你的规则计算出Result列的值，并生成所需的输出。

英文:

I have a dataframe:

df1 = pd.DataFrame({&#39;Name&#39;: [&#39;Bob&#39;, &#39;Bob&#39;, &#39;Bob&#39;, &#39;Joe&#39;, &#39;Joe&#39;, &#39;Joe&#39;, &#39;Alan&#39;, &#39;Alan&#39;, &#39;Steve&#39;, &#39;Steve&#39;],
                &#39;ID&#39;: [1,2,3,4,5,6,7,8,9,10],
                &#39;Value&#39;: [&#39;Y&#39;,&#39;Y&#39;,&#39;Y&#39;,&#39;N&#39;,&#39;N&#39;,&#39;N&#39;,&#39;Y&#39;,&#39;N&#39;,&#39;N&#39;,&#39;Y&#39;]})
Name    ID    Value   
 Bob     1       Y          
 Bob     2       Y          
 Bob     3       Y          
 Joe     4       N          
 Joe     5       N          
 Joe     6       N
 Alan    7       Y
 Alan    8       N
 Steve   9       N
 Steve   10      Y

I need to compute a new Result column that has the following rule. For each group Name so Bob, Joe, etc., if each Value is 'Y', assign each value a Y in the new column. Otherwise, assign it a 'N'.

So ideal output is:

 Name    ID    Value   Result
 Bob     1       Y       Y
 Bob     2       Y       Y  
 Bob     3       Y       Y  
 Joe     4       N       N  
 Joe     5       N       N  
 Joe     6       N       N
 Alan    7       Y       N
 Alan    8       N       N
 Steve   9       N       N
 Steve   10      Y       N

This is what I have so far but doesn't work correctly.

df1[&#39;Result&#39;] = df1.groupby(&#39;Name&#39;).Value.all().reindex(df1.Name).astype(str).values
df1

答案1

得分: 2

使用 numpy.where 与 GroupBy.transform 处理与原始大小相同的 Series，以及 GroupBy.all：

df1['Result'] = np.where(df1['Value'].eq('Y').groupby(df1['Name']).transform('all'), 'Y', 'N')

替代方法：

mask = df1['Value'].eq('Y').groupby(df1['Name']).transform('all')
df1.loc[~mask, 'Value'] = 'N'

或者获取至少具有 N 个值的所有组，并使用 Series.isin 根据 mask 替换为 N：

mask = df1['Name'].isin(df1.loc[df1['Value'].eq('N'), 'Name'])
df1.loc[mask, 'Value'] = 'N'

print(df1)
    Name  ID Value
0    Bob   1     Y
1    Bob   2     Y
2    Bob   3     Y
3    Joe   4     N
4    Joe   5     N
5    Joe   6     N
6   Alan   7     N
7   Alan   8     N
8  Steve   9     N
9  Steve  10     N

英文:

Use numpy.where with GroupBy.transform for Series with same size like original and GroupBy.all:

df1[&#39;Result&#39;] = np.where(df1[&#39;Value&#39;].eq(&#39;Y&#39;).groupby(df1[&#39;Name&#39;]).transform(&#39;all&#39;), &#39;Y&#39;, &#39;N&#39;)

Alternative:

mask = df1[&#39;Value&#39;].eq(&#39;Y&#39;).groupby(df1[&#39;Name&#39;]).transform(&#39;all&#39;)
df1.loc[~mask, &#39;Value&#39;] = &#39;N&#39;

Or get all groups with at least N and replace by N by mask with Series.isin:

mask = df1[&#39;Name&#39;].isin(df1.loc[df1[&#39;Value&#39;].eq(&#39;N&#39;), &#39;Name&#39;])
df1.loc[mask, &#39;Value&#39;] = &#39;N&#39;

print (df1)
    Name  ID Value
0    Bob   1     Y
1    Bob   2     Y
2    Bob   3     Y
3    Joe   4     N
4    Joe   5     N
5    Joe   6     N
6   Alan   7     N
7   Alan   8     N
8  Steve   9     N
9  Steve  10     N

答案2

得分: 1

你快要成功了！以下是您可以这样做的方法：

df1["Result"] = df1.groupby("Name").Value.transform(lambda value: all(value == "Y"))

英文:

You were close! Here's how you could do it:

df1[&quot;Result&quot;] = df1.groupby(&quot;Name&quot;).Value.transform(lambda value: all(value == &quot;Y&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据分组函数创建新列

问题

答案1

答案2

在Django表单中保存复选框的值。

描述数据何时使用 `value_counts`。

Golang中与Python的F-strings相等的是什么？

不确定我的Altair代码在pandas可视化中有什么问题？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。