按照一列中的值序列分组

huangapple go评论68阅读模式
英文:

Groupby sequence of values in one column

问题

这是我的数据框:

df = pd.DataFrame(
    {
        'a': [1, 10, 20, 30, 40, 90, 100, 200, 11], 
        'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
    }
)

这是我想要对其进行分组的方式:

0    1  x
1   10  y
2   20  h
3   30  z

4   40  x
5   90  z

6  100  x
7  200  a
8   11  z

我想要在列 b 中出现 x 时开始分组,在出现 z 时结束分组。显然,我想要包括 x 和 z 之间的所有内容,比如第一组。

我尝试了这个问题中的答案,但仍然无法解决问题。

英文:

This is my dataframe:

df = pd.DataFrame(
    {
        'a': [1, 10, 20, 30, 40, 90, 100, 200, 11], 
        'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
    }
)

And this is the way that I want to group it:

0    1  x
1   10  y
2   20  h
3   30  z

4   40  x
5   90  z

6  100  x
7  200  a
8   11  z

I want to start grouping when there is x in column b and end the group when there is z in b. Obviously I want to include everything that comes in between x and z like the first group for example.

I tried the answers of this question but still couldn't solve the problem.

答案1

得分: 2

We can use the grouper made by @mozway in this answer :

# 将True/False映射为x/z
m1 = df["b"].map({"x": True, "z": False})
m2 = m1.ffill().fillna(False)
# 仅保留最后的x
m3 = m1.shift(-1).ne(True)
m4 = m2&m3

grp = (m1&m3).cumsum().where(m4|m4.shift(), 0)
#[1, 1, 1, 1, 2, 2, 3, 3, 3]

dfs = {f"group_{n}": g for n,g in df.groupby(grp)}

Output :

print(dfs)

{'group_1':  a  b
  0   1  x
  1  10  y
  2  20  h
  3  30  z,
 'group_2':  a  b
  4  40  x
  5  90  z,
 'group_3':  a  b
  6  100  x
  7  200  a
  8   11  z}
英文:

We can use the grouper made by @mozway in this answer :

# map True/False to x/z
m1 = df["b"].map({"x": True, "z": False})
m2 = m1.ffill().fillna(False)
# only keep last x's
m3 = m1.shift(-1).ne(True)
m4 = m2&m3
​
grp = (m1&m3).cumsum().where(m4|m4.shift(), 0)
#[1, 1, 1, 1, 2, 2, 3, 3, 3]
​
dfs = {f"group_{n}": g for n,g in df.groupby(grp)}


Ouptut :

print(dfs)

{'group_1':  a  b
 0   1  x
 1  10  y
 2  20  h
 3  30  z,
 'group_2':  a  b
 4  40  x
 5  90  z,
 'group_3':  a  b
 6  100  x
 7  200  a
 8   11  z}

答案2

得分: 0

import pandas as pd

df = pd.DataFrame(
{
'a': [1, 10, 20, 30, 40, 90, 100, 200, 11],
'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
}
)

创建一个名为 group 的新列

df['group'] = df['b'].str.extract(r'(x(.*)z)').fillna('').str.strip()

根据 group 列对DataFrame进行分组

df = df.groupby('group').agg({'a': 'first', 'b': 'last'})

print(df)

输出:

group a b

0 x 1 x

1 y 10 y

2 h 20 h

3 z 30 z

4 x 40 x

5 z 90 z

6 x 100 x

7 a 200 a

8 z 11 z

英文:

Is this what you are looking for:

import pandas as pd

df = pd.DataFrame(
    {
        'a': [1, 10, 20, 30, 40, 90, 100, 200, 11], 
        'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
    }
)

# Create a new column called `group`
df['group'] = df['b'].str.extract(r'(x(.*)z)').fillna('').str.strip()

# Group the DataFrame by the `group` column
df = df.groupby('group').agg({'a': 'first', 'b': 'last'})

print(df)

# Output:
#   group  a  b
# 0     x  1  x
# 1     y 10  y
# 2     h  20  h
# 3     z  30  z
# 4     x  40  x
# 5     z  90  z
# 6     x 100  x
# 7     a 200  a
# 8     z  11  z

huangapple
  • 本文由 发表于 2023年6月5日 01:27:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76401612.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定