2023年6月19日 15:54:13go评论109阅读模式

英文:

Pandas group by find the difference with respect to flag id's

问题

以下是翻译好的内容：

我有以下数据框：

id  flag  col_1  col_2  name

0 1 1 11 13 a
1 2 0 62 14 b
2 1 0 13 15 a
3 2 1 74 16 b
4 3 1 25 17 c
5 3 0 22 18 c


我需要以下输出：

id  col_3  col_4  name

0 1 2 2 a
1 2 -12 -2 b
2 3 -3 1 c


我需要按id和name分组，然后获取具有相同id和name的col_1中的flag[0]减去flag[1]。
提前感谢。

英文:

I Have the following Data frame:

    id  flag  col_1  col_2  name
0   1   1      11    13      a
1   2   0      62    14      b 
2   1   0      13    15      a   
3   2   1      74    16      b  
4   3   1      25    17      c
5   3   0      22    18      c

I need this as the output -

    id  col_3  col_4  name
0   1     2       2     a
1   2   -12      -2     b 
2   3    -3       1     c

I need to group by id, name and take flag[0] of col_1 - flag[1] of the col_1 which has id, name in common.
Thanks in advance.

答案1

得分: 3

# 使用简单的索引与临时索引（`set_index` 和 `reset_index`）
tmp = df.set_index(['flag', 'id', 'name'])
out = (tmp.loc[0] - tmp.loc[1]).reset_index()

输出:

   id name  col_1  col_2
0   1    a      2      2
1   2    b    -12     -2
2   3    c     -3      1

使用的输入:

df = pd.DataFrame({'id': [1, 2, 1, 2, 3, 3],
                   'flag': [1, 0, 0, 1, 1, 0],
                   'col_1': [11, 62, 13, 74, 25, 22],
                   'col_2': [13, 14, 15, 16, 17, 18],
                   'name': ['a', 'b', 'a', 'b', 'c', 'c']})

英文:

Using simple indexing with a temporary index (set_index and reset_index)

tmp = df.set_index([&#39;flag&#39;, &#39;id&#39;, &#39;name&#39;])
out = (tmp.loc[0] - tmp.loc[1]).reset_index()

Output:

   id name  col_1  col_2
0   1    a      2      2
1   2    b    -12     -2
2   3    c     -3      1

Used input:

df = pd.DataFrame({&#39;id&#39;: [1, 2, 1, 2, 3, 3],
                   &#39;flag&#39;: [1, 0, 0, 1, 1, 0],
                   &#39;col_1&#39;: [11, 62, 13, 74, 25, 22],
                   &#39;col_2&#39;: [13, 14, 15, 16, 17, 18],
                   &#39;name&#39;: [&#39;a&#39;, &#39;b&#39;, &#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;c&#39;]})

答案2

得分: 3

以下是翻译好的内容：

另一个可能的解决方案：
out = (
    df.groupby(["id", "name"])
        .apply(lambda g: -g.pop("flag").diff().max() * g.diff())
        .dropna().droplevel(2).reset_index()
)
输出：
print(out)
   id name  col_1  col_2
0   1    a   2.00   2.00
1   2    b -12.00  -2.00
2   3    c  -3.00   1.00

英文:

Another possible solution :

out = (
    df.groupby([&quot;id&quot;, &quot;name&quot;])
        .apply(lambda g: -g.pop(&quot;flag&quot;).diff().max() * g.diff())
        .dropna().droplevel(2).reset_index()
)

Output :

print(out)
   id name  col_1  col_2
0   1    a   2.00   2.00
1   2    b -12.00  -2.00
2   3    c  -3.00   1.00

答案3

得分: 0

使用 DataFrame.pivot 进行重塑，可以通过 DataFrame.xs 选择 0/1 水平的 flag 列，并进行相减操作，然后通过 DataFrame.reset_index 将多重索引转换为列，最后通过 DataFrame.reindex 获取原始列的顺序（过滤掉 flag 列）：

df1 = df.pivot(index=['id','name'], columns='flag')
out = (df1.xs(0, axis=1, level=1).sub(df1.xs(1, axis=1, level=1))
          .reset_index()
          .reindex(df.columns.difference(['flag'], sort=False), axis=1))
print (out)
   id  col_1  col_2 name
0   1      2      2    a
1   2    -12     -2    b
2   3     -3      1    c

英文:

Use DataFrame.pivot for reshape, so possible select 0/1 levels by flag column by DataFrame.xs and subtract, last convert Mulitindex in index to columns by DataFrame.reset_index and get original order of columns by DataFrame.reindex (with filter out flag column):

df1 = df.pivot(index=[&#39;id&#39;,&#39;name&#39;], columns=&#39;flag&#39;)
out = (df1.xs(0, axis=1, level=1).sub(df1.xs(1, axis=1, level=1))
          .reset_index()
          .reindex(df.columns.difference([&#39;flag&#39;], sort=False), axis=1))
print (out)
   id  col_1  col_2 name
0   1      2      2    a
1   2    -12     -2    b
2   3     -3      1    c

答案4

得分: 0

另一种可能的解决方案：

(df.groupby(['id', 'name'])[['flag', 'col_1', 'col_2']]
 .apply(lambda x: x.sort_values('flag', ascending=False).diff())
 .dropna().droplevel(2).drop('flag', axis=1).reset_index())

输出：

   id name  col_1  col_2
0   1    a    2.0    2.0
1   2    b  -12.0   -2.0
2   3    c   -3.0    1.0

英文:

Another possible solution:

(df.groupby([&#39;id&#39;, &#39;name&#39;])[[&#39;flag&#39;, &#39;col_1&#39;, &#39;col_2&#39;]]
 .apply(lambda x: x.sort_values(&#39;flag&#39;, ascending=False).diff())
 .dropna().droplevel(2).drop(&#39;flag&#39;, axis=1).reset_index())

Output:

   id name  col_1  col_2
0   1    a    2.0    2.0
1   2    b  -12.0   -2.0
2   3    c   -3.0    1.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas按标志ID分组，查找相对差异。

问题

答案1

答案2

答案3

答案4

比较机器学习模型的输出与CSV文件的输出。

当前时间戳在由序列化生产者序列化后更改为1970年01月20日。

筛选出季度最后一个可用日期的行 pandas

不同数量的嵌套for循环

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。