英文:
Count of unique days grouped by value - pandas
问题
我将为您翻译代码中的注释和文本,不包括代码本身。请看下面的翻译:
# 我的目标是在pandas数据框中的新列中分配唯一日期的累积计数。它应该计算来自“Date”的唯一日期的数量,按“Code”和“Item”分组。一旦“Code”或“Item”中的连续值被中断,计数应该重置为0。
import pandas as pd
df = pd.DataFrame({"Date":["2023-03-01", "2023-03-01", "2023-03-01", "2023-03-04", "2023-03-06", "2023-03-06", "2023-03-07", "2023-03-08", "2023-03-09","2023-03-01", "2023-03-02", "2023-03-03", "2023-03-03", "2023-03-03","2023-03-03", "2023-03-04", "2023-03-05", "2023-03-06"],
"Code":["X", "X", "X", "X", "X", "X", "X", "X", "X", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"],
"Item":["A", "A", "A", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"],
})
df["Date"] = pd.to_datetime(df["Date"])
df["Daily_Count"] = df.groupby(["Code", "Item", df["Date"].dt.date]).cumcount()
# 预期输出:
# Date Code Item Daily_Count
# 0 2023-03-01 X A 1
# 1 2023-03-01 X A 1
# 2 2023-03-01 X A 1
# 3 2023-03-04 X B 1
# 4 2023-03-06 X B 2
# 5 2023-03-06 X B 2
# 6 2023-03-07 X B 3
# 7 2023-03-08 X A 1
# 8 2023-03-09 X A 2
# 9 2023-03-01 Y A 1
# 10 2023-03-02 Y A 2
# 11 2023-03-03 Y A 3
# 12 2023-03-03 Y A 3
# 13 2023-03-03 Y A 3
# 14 2023-03-03 Y A 3
# 15 2023-03-04 Y A 4
# 16 2023-03-05 Y A 5
# 17 2023-03-06 Y A 6
希望这有助于您理解代码的功能。如果您有任何其他问题,请随时提出。
英文:
I'm aiming to assign a cumulative count of unique days to a new column in a pandas df. It should count the number of unique days, gathered from Date, grouped by Code and Item. Once consecutive values in Code or Item are broken, the count should reset to 0.
import pandas as pd
df = pd.DataFrame({"Date":['2023-03-01', '2023-03-01', '2023-03-01', '2023-03-04', '2023-03-06', '2023-03-06', '2023-03-07', '2023-03-08', '2023-03-09','2023-03-01', '2023-03-02', '2023-03-03', '2023-03-03', '2023-03-03','2023-03-03', '2023-03-04', '2023-03-05', '2023-03-06'],
"Code":['X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y'],
"Item":['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
})
df['Date'] = pd.to_datetime(df['Date'])
df['Daily_Count'] = df.groupby(['Code', 'Item', df['Date'].dt.date]).cumcount()
Intended output:
Date Code Item Daily_Count
0 2023-03-01 X A 1
1 2023-03-01 X A 1
2 2023-03-01 X A 1
3 2023-03-04 X B 1
4 2023-03-06 X B 2
5 2023-03-06 X B 2
6 2023-03-07 X B 3
7 2023-03-08 X A 1
8 2023-03-09 X A 2
9 2023-03-01 Y A 1
10 2023-03-02 Y A 2
11 2023-03-03 Y A 3
12 2023-03-03 Y A 3
13 2023-03-03 Y A 3
14 2023-03-03 Y A 3
15 2023-03-04 Y A 4
16 2023-03-05 Y A 5
17 2023-03-06 Y A 6
答案1
得分: 2
你需要将你的数值分组成(Code, Item)组,你可以通过将这些值与它们的前一个值进行比较,并在它们中的一个发生变化时开始一个新的组:
g = (df[['Code','Item']] != df[['Code', 'Item']].shift()).any(axis=1).cumsum()
# 0 1
# 1 1
# 2 1
# 3 2
# 4 2
# 5 2
# 6 2
# 7 3
# 8 3
然后,你可以使用这些值对你的数据框进行分组,并计算每个组中Date变化的次数:
df['Daily_Count'] = df.groupby(g)['Date'].transform(lambda g:(g != g.shift()).cumsum())
输出:
Date Code Item Daily_Count
0 2023-03-01 X A 1
1 2023-03-01 X A 1
2 2023-03-01 X A 1
3 2023-03-04 X B 1
4 2023-03-06 X B 2
5 2023-03-06 X B 2
6 2023-03-07 X B 3
7 2023-03-08 X A 1
8 2023-03-09 X A 2
9 2023-03-01 Y A 1
10 2023-03-02 Y A 2
11 2023-03-03 Y A 3
12 2023-03-03 Y A 3
13 2023-03-03 Y A 3
14 2023-03-03 Y A 3
15 2023-03-04 Y A 4
16 2023-03-05 Y A 5
17 2023-03-06 Y A 6
英文:
You need to group your values into (Code, Item) groups, which you can do by comparing those values against their previous values, and starting a new group whenever one of them changes:
g = (df[['Code','Item']] != df[['Code', 'Item']].shift()).any(axis=1).cumsum()
# 0 1
# 1 1
# 2 1
# 3 2
# 4 2
# 5 2
# 6 2
# 7 3
# 8 3
You can then group your dataframe using these values, and sum the number of times the Date changes in each group:
df['Daily_Count'] = df.groupby(g)['Date'].transform(lambda g:(g != g.shift()).cumsum())
Output:
Date Code Item Daily_Count
0 2023-03-01 X A 1
1 2023-03-01 X A 1
2 2023-03-01 X A 1
3 2023-03-04 X B 1
4 2023-03-06 X B 2
5 2023-03-06 X B 2
6 2023-03-07 X B 3
7 2023-03-08 X A 1
8 2023-03-09 X A 2
9 2023-03-01 Y A 1
10 2023-03-02 Y A 2
11 2023-03-03 Y A 3
12 2023-03-03 Y A 3
13 2023-03-03 Y A 3
14 2023-03-03 Y A 3
15 2023-03-04 Y A 4
16 2023-03-05 Y A 5
17 2023-03-06 Y A 6
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论