2023年6月5日 01:27:26go评论68阅读模式

英文:

Groupby sequence of values in one column

问题

这是我的数据框：

df = pd.DataFrame(
    {
        'a': [1, 10, 20, 30, 40, 90, 100, 200, 11], 
        'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
    }
)

这是我想要对其进行分组的方式：

我想要在列 b 中出现 x 时开始分组，在出现 z 时结束分组。显然，我想要包括 x 和 z 之间的所有内容，比如第一组。

我尝试了这个问题中的答案，但仍然无法解决问题。

英文:

This is my dataframe:

df = pd.DataFrame(
    {
        &#39;a&#39;: [1, 10, 20, 30, 40, 90, 100, 200, 11], 
        &#39;b&#39;: [&#39;x&#39;, &#39;y&#39;, &#39;h&#39;, &#39;z&#39;, &#39;x&#39;, &#39;z&#39;, &#39;x&#39;, &#39;a&#39;, &#39;z&#39;]
    }
)

And this is the way that I want to group it:

I want to start grouping when there is x in column b and end the group when there is z in b. Obviously I want to include everything that comes in between x and z like the first group for example.

I tried the answers of this question but still couldn't solve the problem.

答案1

得分: 2

We can use the grouper made by @mozway in this answer :

# 将True/False映射为x/z
m1 = df["b"].map({"x": True, "z": False})
m2 = m1.ffill().fillna(False)
# 仅保留最后的x
m3 = m1.shift(-1).ne(True)
m4 = m2&m3

grp = (m1&m3).cumsum().where(m4|m4.shift(), 0)
#[1, 1, 1, 1, 2, 2, 3, 3, 3]

dfs = {f"group_{n}": g for n,g in df.groupby(grp)}

Output :

print(dfs)

{'group_1':  a  b
  0   1  x
  1  10  y
  2  20  h
  3  30  z,
 'group_2':  a  b
  4  40  x
  5  90  z,
 'group_3':  a  b
  6  100  x
  7  200  a
  8   11  z}

英文:

We can use the grouper made by @mozway in this answer :

# map True/False to x/z
m1 = df[&quot;b&quot;].map({&quot;x&quot;: True, &quot;z&quot;: False})
m2 = m1.ffill().fillna(False)
# only keep last x&#39;s
m3 = m1.shift(-1).ne(True)
m4 = m2&amp;m3

grp = (m1&amp;m3).cumsum().where(m4|m4.shift(), 0)
#[1, 1, 1, 1, 2, 2, 3, 3, 3]

dfs = {f&quot;group_{n}&quot;: g for n,g in df.groupby(grp)}

Ouptut :

print(dfs)

{&#39;group_1&#39;:  a  b
 0   1  x
 1  10  y
 2  20  h
 3  30  z,
 &#39;group_2&#39;:  a  b
 4  40  x
 5  90  z,
 &#39;group_3&#39;:  a  b
 6  100  x
 7  200  a
 8   11  z}

答案2

得分: 0

import pandas as pd

df = pd.DataFrame(
{
'a': [1, 10, 20, 30, 40, 90, 100, 200, 11],
'b': ['x', 'y', 'h', 'z', 'x', 'z', 'x', 'a', 'z']
}
)

创建一个名为 `group` 的新列

df['group'] = df['b'].str.extract(r'(x(.*)z)').fillna('').str.strip()

根据 `group` 列对DataFrame进行分组

df = df.groupby('group').agg({'a': 'first', 'b': 'last'})

print(df)

输出:

group a b

0 x 1 x

1 y 10 y

2 h 20 h

3 z 30 z

4 x 40 x

5 z 90 z

6 x 100 x

7 a 200 a

8 z 11 z

英文:

Is this what you are looking for:

import pandas as pd

df = pd.DataFrame(
    {
        &#39;a&#39;: [1, 10, 20, 30, 40, 90, 100, 200, 11], 
        &#39;b&#39;: [&#39;x&#39;, &#39;y&#39;, &#39;h&#39;, &#39;z&#39;, &#39;x&#39;, &#39;z&#39;, &#39;x&#39;, &#39;a&#39;, &#39;z&#39;]
    }
)

# Create a new column called `group`
df[&#39;group&#39;] = df[&#39;b&#39;].str.extract(r&#39;(x(.*)z)&#39;).fillna(&#39;&#39;).str.strip()

# Group the DataFrame by the `group` column
df = df.groupby(&#39;group&#39;).agg({&#39;a&#39;: &#39;first&#39;, &#39;b&#39;: &#39;last&#39;})

print(df)

# Output:
#   group  a  b
# 0     x  1  x
# 1     y 10  y
# 2     h  20  h
# 3     z  30  z
# 4     x  40  x
# 5     z  90  z
# 6     x 100  x
# 7     a 200  a
# 8     z  11  z

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按照一列中的值序列分组

问题

答案1

答案2

创建一个名为 `group` 的新列

根据 `group` 列对DataFrame进行分组

输出:

group a b

0 x 1 x

1 y 10 y

2 h 20 h

3 z 30 z

4 x 40 x

5 z 90 z

6 x 100 x

7 a 200 a

8 z 11 z

匹配规则的JSON与数据的JSON以在Python 3中查找值

如何使用变量而不是数字在花括号内格式化字符串？

Python Selenium无法点击一个按钮，但可以点击另一个按钮。

如何解决使用pip安装pybabel-0.0.0.dev0后消失的问题？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论

问题

答案1

答案2

创建一个名为 group 的新列

根据 group 列对DataFrame进行分组

输出:

group a b

0 x 1 x

1 y 10 y

2 h 20 h

3 z 30 z

4 x 40 x

5 z 90 z

6 x 100 x

7 a 200 a

8 z 11 z

发表评论

创建一个名为 `group` 的新列

根据 `group` 列对DataFrame进行分组