2023年8月9日 14:42:11go评论93阅读模式

英文:

Aggregate rows in pandas

问题

我有很多类似的pandas行，像这样：

日期	位置
2023-08-01 12:01:00	A23
2023-08-01 12:20:00	A23
2023-08-01 13:10:10	A23
2023-08-02 12:00:00	B12
2023-08-02 12:01:00	A23
2023-08-02 12:05:00	A23

我需要按"位置"聚合值，并合并日期范围，像这样：

日期	日期2	位置
2023-08-01 12:01:00	2023-08-01 13:10:10	A23
2023-08-02 12:00:00	NaN	B12
2023-08-02 12:01:00	2023-08-02 12:05:00	A23

谢谢。

英文:

I have many similar rows in pandas like this:

Date	Position
2023-08-01 12:01:00	A23
2023-08-01 12:20:00	A23
2023-08-01 13:10:10	A23
2023-08-02 12:00:00	B12
2023-08-02 12:01:00	A23
2023-08-02 12:05:00	A23

and Im need to aggregate values by "Position" and merge Datetime range like this:

Date	Date2	Position
2023-08-01 12:01:00	2023-08-01 13:10:10	A23
2023-08-02 12:00:00	NaN	B12
2023-08-02 12:01:00	2023-08-02 12:05:00	A23

Thank you

答案1

得分: 1

假设您想要按照相同连续位置的组来获取每个组的最小/最大日期，并使用自定义的groupby.agg进行后处理：

# 确保日期是datetime类型
df['Date'] = pd.to_datetime(df['Date'])

# 分组连续的位置
group = df['Position'].ne(df['Position'].shift()).cumsum()

out = (df
   .groupby(group, as_index=False)
   .agg(Date=('Date', 'min'),
        Date2=('Date', 'max'),
        Position=('Position', 'first'),
        n=('Position', 'count')
       )
   # 如果组内只有一个项目，则隐藏Date2
   # 也可以检查Date ≠ Date2
   .assign(Date2=lambda d: d['Date2'].where(d.pop('n').gt(1)))
)

注意：要按位置和日期分组，请使用.groupby(['Position', df['Date'].dt.normalize()], as_index=False)。

输出结果：

                 Date               Date2 Position
0 2023-08-01 12:01:00 2023-08-01 13:10:10      A23
1 2023-08-02 12:00:00                 NaT      B12
2 2023-08-02 12:01:00 2023-08-02 12:05:00      A23

以上是给定代码的翻译结果。

英文:

Assuming you want you min/max date per groups of identical successive positions, and using a custom groupby.agg with post-processing:

# ensure datetime
df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;])

# group successive positions
group = df[&#39;Position&#39;].ne(df[&#39;Position&#39;].shift()).cumsum()

out = (df
   .groupby(group, as_index=False)
   .agg(Date=(&#39;Date&#39;, &#39;min&#39;),
        Date2=(&#39;Date&#39;, &#39;max&#39;),
        Position=(&#39;Position&#39;, &#39;first&#39;),
        n=(&#39;Position&#39;, &#39;count&#39;)
       )
   # hide Date2 if there was not more than 1 item in the group
   # you could also check that Date ≠ Date2
   .assign(Date2=lambda d: d[&#39;Date2&#39;].where(d.pop(&#39;n&#39;).gt(1)))
)

NB. to group by position and day, use .groupby(['Position', df['Date'].dt.normalize()], as_index=False).

Output:

                 Date               Date2 Position
0 2023-08-01 12:01:00 2023-08-01 13:10:10      A23
1 2023-08-02 12:00:00                 NaT      B12
2 2023-08-02 12:01:00 2023-08-02 12:05:00      A23

答案2

得分: 1

import pandas as pd
from io import StringIO
from pandas import Timestamp

df = pd.DataFrame(
    {'Date': {0: Timestamp('2023-08-01 12:01:00'), 
              1: Timestamp('2023-08-01 12:20:00'), 
              2: Timestamp('2023-08-01 13:10:10'), 
              3: Timestamp('2023-08-02 12:00:00'), 
              4: Timestamp('2023-08-02 12:01:00'), 
              5: Timestamp('2023-08-02 12:05:00')}, 
    'Position': {0: 'A23', 
                 1: 'A23', 
                 2: 'A23', 
                 3: 'B12', 
                 4: 'A23', 
                 5: 'A23'}}
)


# 检查Position列的值是否与前一行的Position值相同
df['Group'] = (df['Position'] != df['Position'].shift()).cumsum()

# 按照Group和Position列进行分组，然后获取Date列的最小值和最大值，最后删除Group列
df = df.groupby(['Group', 'Position'])['Date'].agg(['min', 'max']).reset_index().drop('Group', axis=1)

# 如果max等于min，则max应为NaN
df['max'] = df['max'].where(df['max'] != df['min'])

# 将列名重命名为所需的名称
df.rename(columns={'min': 'Date1', 'max': 'Date2'}, inplace=True)

# 输出结果:
>>> df
  Position               Date1               Date2
0      A23 2023-08-01 12:01:00 2023-08-01 13:10:10
1      B12 2023-08-02 12:00:00                 NaT
2      A23 2023-08-02 12:01:00 2023-08-02 12:05:00

英文:

import pandas as pd
from io import StringIO
from pandas import Timestamp

df = pd.DataFrame(
    {&#39;Date&#39;: {0: Timestamp(&#39;2023-08-01 12:01:00&#39;), 
              1: Timestamp(&#39;2023-08-01 12:20:00&#39;), 
              2: Timestamp(&#39;2023-08-01 13:10:10&#39;), 
              3: Timestamp(&#39;2023-08-02 12:00:00&#39;), 
              4: Timestamp(&#39;2023-08-02 12:01:00&#39;), 
              5: Timestamp(&#39;2023-08-02 12:05:00&#39;)}, 
    &#39;Position&#39;: {0: &#39;A23&#39;, 
                 1: &#39;A23&#39;, 
                 2: &#39;A23&#39;, 
                 3: &#39;B12&#39;, 
                 4: &#39;A23&#39;, 
                 5: &#39;A23&#39;}}
)


# check if the Position value is the same as the previous row&#39;s Position value
df[&#39;Group&#39;] = (df[&#39;Position&#39;] != df[&#39;Position&#39;].shift()).cumsum()

# group by the group and position columns, then get the min and max of the date column, then drop the group column
df = df.groupby([&#39;Group&#39;, &#39;Position&#39;])[&#39;Date&#39;].agg([&#39;min&#39;, &#39;max&#39;]).reset_index().drop(&#39;Group&#39;, axis=1)

# if max == min, then max should be NaN
df[&#39;max&#39;] = df[&#39;max&#39;].where(df[&#39;max&#39;] != df[&#39;min&#39;])

# rename the columns to the desired names
df.rename(columns={&#39;min&#39;: &#39;Date1&#39;, &#39;max&#39;: &#39;Date2&#39;}, inplace=True)

# Output:
&gt;&gt;&gt; df
  Position               Date1               Date2
0      A23 2023-08-01 12:01:00 2023-08-01 13:10:10
1      B12 2023-08-02 12:00:00                 NaT
2      A23 2023-08-02 12:01:00 2023-08-02 12:05:00

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas中聚合行数据

问题

答案1

答案2

Pandas/matplotlib初学者：如何汇总具有不同索引的时间序列数据？

Compute outliers 2 standard dev away for each pandas DataFrame column and replace with NaN

使用Pandas中的DataFrame进行groupby操作。

如何在VSCode中升级pandas 2.0rc

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论