英文:
Groupby sequence of values in one column
问题
这是我的数据框:
df = pd.DataFrame(
{
'a': [1, 10, 20, 30, 40, 90, 100, 200, 11],
'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
}
)
这是我想要对其进行分组的方式:
0 1 x
1 10 y
2 20 h
3 30 z
4 40 x
5 90 z
6 100 x
7 200 a
8 11 z
我想要在列 b
中出现 x 时开始分组,在出现 z 时结束分组。显然,我想要包括 x 和 z 之间的所有内容,比如第一组。
我尝试了这个问题中的答案,但仍然无法解决问题。
英文:
This is my dataframe:
df = pd.DataFrame(
{
'a': [1, 10, 20, 30, 40, 90, 100, 200, 11],
'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
}
)
And this is the way that I want to group it:
0 1 x
1 10 y
2 20 h
3 30 z
4 40 x
5 90 z
6 100 x
7 200 a
8 11 z
I want to start grouping when there is x in column b
and end the group when there is z in b
. Obviously I want to include everything that comes in between x and z like the first group for example.
I tried the answers of this question but still couldn't solve the problem.
答案1
得分: 2
We can use the grouper made by @mozway in this answer :
# 将True/False映射为x/z
m1 = df["b"].map({"x": True, "z": False})
m2 = m1.ffill().fillna(False)
# 仅保留最后的x
m3 = m1.shift(-1).ne(True)
m4 = m2&m3
grp = (m1&m3).cumsum().where(m4|m4.shift(), 0)
#[1, 1, 1, 1, 2, 2, 3, 3, 3]
dfs = {f"group_{n}": g for n,g in df.groupby(grp)}
Output :
print(dfs)
{'group_1': a b
0 1 x
1 10 y
2 20 h
3 30 z,
'group_2': a b
4 40 x
5 90 z,
'group_3': a b
6 100 x
7 200 a
8 11 z}
英文:
We can use the grouper made by @mozway in this answer :
# map True/False to x/z
m1 = df["b"].map({"x": True, "z": False})
m2 = m1.ffill().fillna(False)
# only keep last x's
m3 = m1.shift(-1).ne(True)
m4 = m2&m3
grp = (m1&m3).cumsum().where(m4|m4.shift(), 0)
#[1, 1, 1, 1, 2, 2, 3, 3, 3]
dfs = {f"group_{n}": g for n,g in df.groupby(grp)}
Ouptut :
print(dfs)
{'group_1': a b
0 1 x
1 10 y
2 20 h
3 30 z,
'group_2': a b
4 40 x
5 90 z,
'group_3': a b
6 100 x
7 200 a
8 11 z}
答案2
得分: 0
import pandas as pd
df = pd.DataFrame(
{
'a': [1, 10, 20, 30, 40, 90, 100, 200, 11],
'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
}
)
创建一个名为 group
的新列
df['group'] = df['b'].str.extract(r'(x(.*)z)').fillna('').str.strip()
根据 group
列对DataFrame进行分组
df = df.groupby('group').agg({'a': 'first', 'b': 'last'})
print(df)
输出:
group a b
0 x 1 x
1 y 10 y
2 h 20 h
3 z 30 z
4 x 40 x
5 z 90 z
6 x 100 x
7 a 200 a
8 z 11 z
英文:
Is this what you are looking for:
import pandas as pd
df = pd.DataFrame(
{
'a': [1, 10, 20, 30, 40, 90, 100, 200, 11],
'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
}
)
# Create a new column called `group`
df['group'] = df['b'].str.extract(r'(x(.*)z)').fillna('').str.strip()
# Group the DataFrame by the `group` column
df = df.groupby('group').agg({'a': 'first', 'b': 'last'})
print(df)
# Output:
# group a b
# 0 x 1 x
# 1 y 10 y
# 2 h 20 h
# 3 z 30 z
# 4 x 40 x
# 5 z 90 z
# 6 x 100 x
# 7 a 200 a
# 8 z 11 z
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论