Count of unique days grouped by value – pandas 按值分组的唯一日期计数 – pandas

huangapple go评论62阅读模式
英文:

Count of unique days grouped by value - pandas

问题

我将为您翻译代码中的注释和文本,不包括代码本身。请看下面的翻译:

# 我的目标是在pandas数据框中的新列中分配唯一日期的累积计数。它应该计算来自“Date”的唯一日期的数量,按“Code”和“Item”分组。一旦“Code”或“Item”中的连续值被中断,计数应该重置为0。
import pandas as pd

df = pd.DataFrame({"Date":["2023-03-01", "2023-03-01", "2023-03-01", "2023-03-04", "2023-03-06", "2023-03-06", "2023-03-07", "2023-03-08", "2023-03-09","2023-03-01", "2023-03-02", "2023-03-03", "2023-03-03", "2023-03-03","2023-03-03", "2023-03-04", "2023-03-05", "2023-03-06"],
               "Code":["X", "X", "X", "X", "X", "X", "X", "X", "X", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"],    
               "Item":["A", "A", "A", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"], 
               })
df["Date"] = pd.to_datetime(df["Date"])

df["Daily_Count"] = df.groupby(["Code", "Item", df["Date"].dt.date]).cumcount()

# 预期输出:
#        Date Code Item  Daily_Count
#     0  2023-03-01    X    A      1
#     1  2023-03-01    X    A      1
#     2  2023-03-01    X    A      1
#     3  2023-03-04    X    B      1
#     4  2023-03-06    X    B      2
#     5  2023-03-06    X    B      2
#     6  2023-03-07    X    B      3
#     7  2023-03-08    X    A      1
#     8  2023-03-09    X    A      2
#     9  2023-03-01    Y    A      1
#     10 2023-03-02    Y    A      2
#     11 2023-03-03    Y    A      3
#     12 2023-03-03    Y    A      3
#     13 2023-03-03    Y    A      3
#     14 2023-03-03    Y    A      3
#     15 2023-03-04    Y    A      4
#     16 2023-03-05    Y    A      5
#     17 2023-03-06    Y    A      6

希望这有助于您理解代码的功能。如果您有任何其他问题,请随时提出。

英文:

I'm aiming to assign a cumulative count of unique days to a new column in a pandas df. It should count the number of unique days, gathered from Date, grouped by Code and Item. Once consecutive values in Code or Item are broken, the count should reset to 0.

import pandas as pd

df = pd.DataFrame({"Date":['2023-03-01', '2023-03-01', '2023-03-01', '2023-03-04', '2023-03-06', '2023-03-06', '2023-03-07', '2023-03-08', '2023-03-09','2023-03-01', '2023-03-02', '2023-03-03', '2023-03-03', '2023-03-03','2023-03-03', '2023-03-04', '2023-03-05', '2023-03-06'],
           "Code":['X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y'],    
           "Item":['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'], 
           })
df['Date'] = pd.to_datetime(df['Date'])

df['Daily_Count'] = df.groupby(['Code', 'Item', df['Date'].dt.date]).cumcount()

Intended output:

   Date Code Item  Daily_Count
0  2023-03-01    X    A      1
1  2023-03-01    X    A      1
2  2023-03-01    X    A      1
3  2023-03-04    X    B      1
4  2023-03-06    X    B      2
5  2023-03-06    X    B      2
6  2023-03-07    X    B      3
7  2023-03-08    X    A      1
8  2023-03-09    X    A      2
9  2023-03-01    Y    A      1
10 2023-03-02    Y    A      2
11 2023-03-03    Y    A      3
12 2023-03-03    Y    A      3
13 2023-03-03    Y    A      3
14 2023-03-03    Y    A      3
15 2023-03-04    Y    A      4
16 2023-03-05    Y    A      5
17 2023-03-06    Y    A      6

答案1

得分: 2

你需要将你的数值分组成(Code, Item)组,你可以通过将这些值与它们的前一个值进行比较,并在它们中的一个发生变化时开始一个新的组:

g = (df[['Code','Item']] != df[['Code', 'Item']].shift()).any(axis=1).cumsum()
# 0    1
# 1    1
# 2    1
# 3    2
# 4    2
# 5    2
# 6    2
# 7    3
# 8    3

然后,你可以使用这些值对你的数据框进行分组,并计算每个组中Date变化的次数:

df['Daily_Count'] = df.groupby(g)['Date'].transform(lambda g:(g != g.shift()).cumsum())

输出:

         Date Code Item  Daily_Count
0  2023-03-01    X    A            1
1  2023-03-01    X    A            1
2  2023-03-01    X    A            1
3  2023-03-04    X    B            1
4  2023-03-06    X    B            2
5  2023-03-06    X    B            2
6  2023-03-07    X    B            3
7  2023-03-08    X    A            1
8  2023-03-09    X    A            2
9  2023-03-01    Y    A            1
10 2023-03-02    Y    A            2
11 2023-03-03    Y    A            3
12 2023-03-03    Y    A            3
13 2023-03-03    Y    A            3
14 2023-03-03    Y    A            3
15 2023-03-04    Y    A            4
16 2023-03-05    Y    A            5
17 2023-03-06    Y    A            6
英文:

You need to group your values into (Code, Item) groups, which you can do by comparing those values against their previous values, and starting a new group whenever one of them changes:

g = (df[['Code','Item']] != df[['Code', 'Item']].shift()).any(axis=1).cumsum()
# 0    1
# 1    1
# 2    1
# 3    2
# 4    2
# 5    2
# 6    2
# 7    3
# 8    3

You can then group your dataframe using these values, and sum the number of times the Date changes in each group:

df['Daily_Count'] = df.groupby(g)['Date'].transform(lambda g:(g != g.shift()).cumsum())

Output:

         Date Code Item  Daily_Count
0  2023-03-01    X    A            1
1  2023-03-01    X    A            1
2  2023-03-01    X    A            1
3  2023-03-04    X    B            1
4  2023-03-06    X    B            2
5  2023-03-06    X    B            2
6  2023-03-07    X    B            3
7  2023-03-08    X    A            1
8  2023-03-09    X    A            2
9  2023-03-01    Y    A            1
10 2023-03-02    Y    A            2
11 2023-03-03    Y    A            3
12 2023-03-03    Y    A            3
13 2023-03-03    Y    A            3
14 2023-03-03    Y    A            3
15 2023-03-04    Y    A            4
16 2023-03-05    Y    A            5
17 2023-03-06    Y    A            6

huangapple
  • 本文由 发表于 2023年5月25日 10:52:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76328606.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定