英文:
How do I conditionally group rows of a dataframe?
问题
在df
的第2
列中,有三种可能的值:X
、Y
、Z
。我想按照值X
以及直接跟在X
后面的任何Y
值来分组行。我不关心在这些分组中保留Z
值。
我尝试使用groupby()
,像这样:df.groupby(df[2] == 'X')
,然而这显然只会获取X
值。
我应该如何创建我想要的分组?
df = pd.DataFrame({'1':['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'],
'2':['Z','X','Y','Z','Z','X','X','Z','X','Y','Y','Z','X','Z','X','Y']})
期望的分组:
1 b X
2 c Y
---------
5 f X
---------
6 g X
---------
8 i X
9 j Y
10 k Y
---------
12 m X
---------
14 o X
15 p Y
英文:
In column 2
of df
, there are three possible values: X
, Y
, Z
. I want to group rows by the value X
along with any trailing Y
values in the columns directly following X
. I am not interested in preserving the Z
values in the groups.
I have tried using groupby()
like this: df.groupby(df[2] == 'X')
, however this obviously only grabs the X
values.
How could I go about creating the groupings that I am after?
df = pd.DataFrame({'1':['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'],
'2':['Z','X','Y','Z','Z','X','X','Z','X','Y','Y','Z','X','Z','X','Y']})
Desired groupings:
1 b X
2 c Y
---------
5 f X
---------
6 g X
---------
8 i X
9 j Y
10 k Y
---------
12 m X
---------
14 o X
15 p Y
答案1
得分: 2
>>> list(df.groupby(df['2'].eq('X').cumsum().loc[df['2'] != 'Z']))
[(1.0,
1 2
1 b X
2 c Y),
(2.0,
1 2
5 f X),
(3.0,
1 2
6 g X),
(4.0,
1 2
8 i X
9 j Y
10 k Y),
(5.0,
1 2
12 m X),
(6.0,
1 2
14 o X
15 p Y)]
Details:
# Same as grp = df['2'].eq('X').cumsum().where(df['2'] != 'Z')
grp = df['2'].eq('X').cumsum().loc[df['2'] != 'Z']
pd.concat([df, grp.rename('G')], axis=1)
# Output
1 2 G
0 a Z NaN
1 b X 1.0
2 c Y 1.0
3 d Z NaN
4 e Z NaN
5 f X 2.0
6 g X 3.0
7 h Z NaN
8 i X 4.0
9 j Y 4.0
10 k Y 4.0
11 l Z NaN
12 m X 5.0
13 n Z NaN
14 o X 6.0
15 p Y 6.0
英文:
You can use:
>>> list(df.groupby(df['2'].eq('X').cumsum().loc[df['2'] != 'Z']))
[(1.0,
1 2
1 b X
2 c Y),
(2.0,
1 2
5 f X),
(3.0,
1 2
6 g X),
(4.0,
1 2
8 i X
9 j Y
10 k Y),
(5.0,
1 2
12 m X),
(6.0,
1 2
14 o X
15 p Y)]
Details:
# Same as grp = df['2'].eq('X').cumsum().where(df['2'] != 'Z')
grp = df['2'].eq('X').cumsum().loc[df['2'] != 'Z']
pd.concat([df, grp.rename('G')], axis=1)
# Output
1 2 G
0 a Z NaN
1 b X 1.0
2 c Y 1.0
3 d Z NaN
4 e Z NaN
5 f X 2.0
6 g X 3.0
7 h Z NaN
8 i X 4.0
9 j Y 4.0
10 k Y 4.0
11 l Z NaN
12 m X 5.0
13 n Z NaN
14 o X 6.0
15 p Y 6.0
答案2
得分: 1
我预计你只想要一个表示每个以X
开头的组的列。如果是这样的话,你可以按照以下步骤进行操作:
[1] 创建一个列来检查数值是否为X
,如果是则为1,否则为0。
[2] 进行cumsum
操作,然后你应该得到所需的组。
df['check_x'] = np.where(df['2'] == 'X', 1, 0)
df['group'] = df['check_x'].cumsum()
df
英文:
I expect you just want to have a column that indicate for every group staring with X
. If this is the case then, you can do the following:
[1] Create a column check if the value is X
or not, if X
then 1 else 0
[2] Do the cumsum
then you should have the group as desired
df['check_x'] = np.where(df['2']=='X', 1, 0)
df['group'] = df['check_x'].cumsum()
df
答案3
得分: -1
以下是代码的翻译部分:
import pandas as pd
df = pd.DataFrame({'1': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p'],
'2': ['Z', 'X', 'Y', 'Z', 'Z', 'X', 'X', 'Z', 'X', 'Y', 'Y', 'Z', 'X', 'Z', 'X', 'Y']})
# 找到列2中 'X' 出现的索引
x_indices = df.index[df['2'] == 'X']
# 初始化一个空列表以存储分组
grouped_indices = []
# 遍历X索引并创建分组
for i in range(len(x_indices)):
start_index = x_indices[i] # 分组的起始索引
end_index = len(df) if i == len(x_indices) - 1 else x_indices[i + 1] # 分组的结束索引
group = df[start_index:end_index] # 根据起始和结束索引切片数据帧
group = group[group['2'] != 'Z'] # 排除列2中包含'Z'的行
grouped_indices.append(group) # 将分组添加到列表中
# 打印所需的分组
for group in grouped_indices:
print(group)
print('---------')
# 1 2
# 1 b X
# 2 c Y
# ---------
# 1 2
# 5 f X
# ---------
# 1 2
# 6 g X
# ---------
# 1 2
# 8 i X
# 9 j Y
# 10 k Y
# ---------
# 1 2
# 12 m X
# ---------
# 1 2
# 14 o X
# 15 p Y
# ---------
英文:
Example:
import pandas as pd
df = pd.DataFrame({'1': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p'],
'2': ['Z', 'X', 'Y', 'Z', 'Z', 'X', 'X', 'Z', 'X', 'Y', 'Y', 'Z', 'X', 'Z', 'X', 'Y']})
# Find the indices where 'X' occurs in column 2
x_indices = df.index[df['2'] == 'X']
# Initialize an empty list to store the groupings
grouped_indices = []
# Iterate over the X indices and create groups
for i in range(len(x_indices)):
start_index = x_indices[i] # Starting index of the group
end_index = len(df) if i == len(x_indices) - 1 else x_indices[i + 1] # Ending index of the group
group = df[start_index:end_index] # Slice the dataframe based on the start and end indices
group = group[group['2'] != 'Z'] # Exclude rows with 'Z' in column 2
grouped_indices.append(group) # Add the group to the list
# Print the desired groupings
for group in grouped_indices:
print(group)
print('---------')
# 1 2
# 1 b X
# 2 c Y
# ---------
# 1 2
# 5 f X
# ---------
# 1 2
# 6 g X
# ---------
# 1 2
# 8 i X
# 9 j Y
# 10 k Y
# ---------
# 1 2
# 12 m X
# ---------
# 1 2
# 14 o X
# 15 p Y
# ---------
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论