2023年6月8日 19:04:27go评论69阅读模式

英文:

How do I conditionally group rows of a dataframe?

问题

在df的第2列中，有三种可能的值：X、Y、Z。我想按照值X以及直接跟在X后面的任何Y值来分组行。我不关心在这些分组中保留Z值。

我尝试使用groupby()，像这样：df.groupby(df[2] == 'X')，然而这显然只会获取X值。

我应该如何创建我想要的分组？

df = pd.DataFrame({'1':['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'],
                   '2':['Z','X','Y','Z','Z','X','X','Z','X','Y','Y','Z','X','Z','X','Y']})

期望的分组：

1	b	X
2	c	Y
---------
5	f	X
---------
6	g	X
---------
8	i	X
9	j	Y
10	k	Y
---------
12	m	X
---------
14	o	X
15	p	Y

英文:

In column 2 of df, there are three possible values: X, Y, Z. I want to group rows by the value X along with any trailing Y values in the columns directly following X. I am not interested in preserving the Z values in the groups.

I have tried using groupby() like this: df.groupby(df[2] == 'X'), however this obviously only grabs the X values.

How could I go about creating the groupings that I am after?

df = pd.DataFrame({&#39;1&#39;:[&#39;a&#39;,&#39;b&#39;,&#39;c&#39;,&#39;d&#39;,&#39;e&#39;,&#39;f&#39;,&#39;g&#39;,&#39;h&#39;,&#39;i&#39;,&#39;j&#39;,&#39;k&#39;,&#39;l&#39;,&#39;m&#39;,&#39;n&#39;,&#39;o&#39;,&#39;p&#39;],
                   &#39;2&#39;:[&#39;Z&#39;,&#39;X&#39;,&#39;Y&#39;,&#39;Z&#39;,&#39;Z&#39;,&#39;X&#39;,&#39;X&#39;,&#39;Z&#39;,&#39;X&#39;,&#39;Y&#39;,&#39;Y&#39;,&#39;Z&#39;,&#39;X&#39;,&#39;Z&#39;,&#39;X&#39;,&#39;Y&#39;]})

Desired groupings:

1	b	X
2	c	Y
---------
5	f	X
---------
6	g	X
---------
8	i	X
9	j	Y
10	k	Y
---------
12	m	X
---------
14	o	X
15	p	Y

答案1

得分: 2

&gt;&gt;&gt; list(df.groupby(df[&#39;2&#39;].eq(&#39;X&#39;).cumsum().loc[df[&#39;2&#39;] != &#39;Z&#39;]))
[(1.0,
     1  2
  1  b  X
  2  c  Y),
 (2.0,
     1  2
  5  f  X),
 (3.0,
     1  2
  6  g  X),
 (4.0,
      1  2
  8   i  X
  9   j  Y
  10  k  Y),
 (5.0,
      1  2
  12  m  X),
 (6.0,
      1  2
  14  o  X
  15  p  Y)]

Details:

# Same as grp = df[&#39;2&#39;].eq(&#39;X&#39;).cumsum().where(df[&#39;2&#39;] != &#39;Z&#39;)
grp = df[&#39;2&#39;].eq(&#39;X&#39;).cumsum().loc[df[&#39;2&#39;] != &#39;Z&#39;]
pd.concat([df, grp.rename(&#39;G&#39;)], axis=1)

# Output
    1  2    G
0   a  Z  NaN
1   b  X  1.0
2   c  Y  1.0
3   d  Z  NaN
4   e  Z  NaN
5   f  X  2.0
6   g  X  3.0
7   h  Z  NaN
8   i  X  4.0
9   j  Y  4.0
10  k  Y  4.0
11  l  Z  NaN
12  m  X  5.0
13  n  Z  NaN
14  o  X  6.0
15  p  Y  6.0

英文:

You can use:

&gt;&gt;&gt; list(df.groupby(df[&#39;2&#39;].eq(&#39;X&#39;).cumsum().loc[df[&#39;2&#39;] != &#39;Z&#39;]))
[(1.0,
     1  2
  1  b  X
  2  c  Y),
 (2.0,
     1  2
  5  f  X),
 (3.0,
     1  2
  6  g  X),
 (4.0,
      1  2
  8   i  X
  9   j  Y
  10  k  Y),
 (5.0,
      1  2
  12  m  X),
 (6.0,
      1  2
  14  o  X
  15  p  Y)]

Details:

# Same as grp = df[&#39;2&#39;].eq(&#39;X&#39;).cumsum().where(df[&#39;2&#39;] != &#39;Z&#39;)
grp = df[&#39;2&#39;].eq(&#39;X&#39;).cumsum().loc[df[&#39;2&#39;] != &#39;Z&#39;]
pd.concat([df, grp.rename(&#39;G&#39;)], axis=1)

# Output
    1  2    G
0   a  Z  NaN
1   b  X  1.0
2   c  Y  1.0
3   d  Z  NaN
4   e  Z  NaN
5   f  X  2.0
6   g  X  3.0
7   h  Z  NaN
8   i  X  4.0
9   j  Y  4.0
10  k  Y  4.0
11  l  Z  NaN
12  m  X  5.0
13  n  Z  NaN
14  o  X  6.0
15  p  Y  6.0

答案2

得分: 1

我预计你只想要一个表示每个以X开头的组的列。如果是这样的话，你可以按照以下步骤进行操作：

[1] 创建一个列来检查数值是否为X，如果是则为1，否则为0。

[2] 进行cumsum操作，然后你应该得到所需的组。

df['check_x'] = np.where(df['2'] == 'X', 1, 0)
df['group'] = df['check_x'].cumsum()
df

英文:

I expect you just want to have a column that indicate for every group staring with X. If this is the case then, you can do the following:

[1] Create a column check if the value is X or not, if X then 1 else 0

[2] Do the cumsum then you should have the group as desired

df[&#39;check_x&#39;] = np.where(df[&#39;2&#39;]==&#39;X&#39;, 1, 0)
df[&#39;group&#39;] = df[&#39;check_x&#39;].cumsum()
df

答案3

得分: -1

以下是代码的翻译部分：

import pandas as pd

df = pd.DataFrame({'1': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p'],
                   '2': ['Z', 'X', 'Y', 'Z', 'Z', 'X', 'X', 'Z', 'X', 'Y', 'Y', 'Z', 'X', 'Z', 'X', 'Y']})

# 找到列2中 'X' 出现的索引
x_indices = df.index[df['2'] == 'X']

# 初始化一个空列表以存储分组
grouped_indices = []

# 遍历X索引并创建分组
for i in range(len(x_indices)):
    start_index = x_indices[i]  # 分组的起始索引
    end_index = len(df) if i == len(x_indices) - 1 else x_indices[i + 1]  # 分组的结束索引
    group = df[start_index:end_index]  # 根据起始和结束索引切片数据帧
    group = group[group['2'] != 'Z']  # 排除列2中包含'Z'的行
    grouped_indices.append(group)  # 将分组添加到列表中

# 打印所需的分组
for group in grouped_indices:
    print(group)
    print('---------')
#    1  2
# 1  b  X
# 2  c  Y
# ---------
#    1  2
# 5  f  X
# ---------
#    1  2
# 6  g  X
# ---------
#     1  2
# 8   i  X
# 9   j  Y
# 10  k  Y
# ---------
#     1  2
# 12  m  X
# ---------
#     1  2
# 14  o  X
# 15  p  Y
# ---------

英文:

Example:

import pandas as pd

df = pd.DataFrame({&#39;1&#39;: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;, &#39;f&#39;, &#39;g&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;, &#39;k&#39;, &#39;l&#39;, &#39;m&#39;, &#39;n&#39;, &#39;o&#39;, &#39;p&#39;],
                   &#39;2&#39;: [&#39;Z&#39;, &#39;X&#39;, &#39;Y&#39;, &#39;Z&#39;, &#39;Z&#39;, &#39;X&#39;, &#39;X&#39;, &#39;Z&#39;, &#39;X&#39;, &#39;Y&#39;, &#39;Y&#39;, &#39;Z&#39;, &#39;X&#39;, &#39;Z&#39;, &#39;X&#39;, &#39;Y&#39;]})

# Find the indices where &#39;X&#39; occurs in column 2
x_indices = df.index[df[&#39;2&#39;] == &#39;X&#39;]

# Initialize an empty list to store the groupings
grouped_indices = []

# Iterate over the X indices and create groups
for i in range(len(x_indices)):
    start_index = x_indices[i]  # Starting index of the group
    end_index = len(df) if i == len(x_indices) - 1 else x_indices[i + 1]  # Ending index of the group
    group = df[start_index:end_index]  # Slice the dataframe based on the start and end indices
    group = group[group[&#39;2&#39;] != &#39;Z&#39;]  # Exclude rows with &#39;Z&#39; in column 2
    grouped_indices.append(group)  # Add the group to the list

# Print the desired groupings
for group in grouped_indices:
    print(group)
    print(&#39;---------&#39;)
#    1  2
# 1  b  X
# 2  c  Y
# ---------
#    1  2
# 5  f  X
# ---------
#    1  2
# 6  g  X
# ---------
#     1  2
# 8   i  X
# 9   j  Y
# 10  k  Y
# ---------
#     1  2
# 12  m  X
# ---------
#     1  2
# 14  o  X
# 15  p  Y
# ---------

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何有条件地对数据框的行进行分组？

问题

答案1

答案2

答案3

Pytorch与已训练的模型+预训练模型（Intel OpenVINO）不兼容。

绕过 Twitter 上的 FunCaptcha。

计算在Pandas中特定列满足特定条件之前的天数

如何在pyarrow数据类型中使用分类数据类型？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论