如何在数据框中从特定日期开始计算cumsum()?

huangapple go评论66阅读模式
英文:

How can I begin calculating cumsum() on a specific date within a dataframe?

问题

有没有一种方法在Pandas数据框内从特定日期开始计算cumsum()

给定以下数据框,我能够计算所有行的cumsum()

import pandas as pd
df = pd.DataFrame([
    {'Date': '2022-01-01', 'Confirmed': 7 },
    {'Date': '2022-01-02', 'Confirmed': 4 },
    {'Date': '2022-01-03', 'Confirmed': 12 },
    {'Date': '2022-01-03', 'Confirmed': 2 },
    {'Date': '2022-01-04', 'Confirmed': 9 },
    {'Date': '2022-01-05', 'Confirmed': 10 },
])

df["Total Confirmed"]  = df["Confirmed"].cumsum()

然而,我想计算从特定日期开始的cumsum()。例如,我想从第一次出现的2022-01-03开始计算cumsum(),结果如下所示:

我注意到有shift()属性,但似乎只针对行数,并且仍然从第一行开始cumsum()

如何在数据框中从特定日期开始计算cumsum()?

英文:

Is there a way to start computing cumsum() on a specific date within a Pandas dataframe?

Given the following dataframe, I am able to calculate the cumsum() for all of the rows.

import pandas as pd
df = pd.DataFrame([
    {'Date': '2022-01-01', 'Confirmed': 7 },
    {'Date': '2022-01-02', 'Confirmed': 4 },
    {'Date': '2022-01-03', 'Confirmed': 12 },
    {'Date': '2022-01-03', 'Confirmed': 2 },
    {'Date': '2022-01-04', 'Confirmed': 9 },
    {'Date': '2022-01-05', 'Confirmed': 10 },
])

df["Total Confirmed"]  = df["Confirmed"].cumsum()

如何在数据框中从特定日期开始计算cumsum()?

However, I would like to calculate the cumsum() starting on a specific date. For example, I would like to begin calculating cumsum() on the first occurrence of 2022-01-03 which would end up looking like this:

如何在数据框中从特定日期开始计算cumsum()?

I noticed that there is the shift() property but that only seems to be specific to the number of rows and it still starts the cumsum() from the first row.

答案1

得分: 1

你可以尝试:

ser = df["Confirmed"].where(df["Date"].eq("2022-01-03").cummax(), 0)

df["Total Confirmed"] = ser.cumsum()

另一种变体:

df.iloc[:df["Date"].eq("2022-01-03").idxmax()] = np.nan

df["Total Confirmed"] = df["Confirmed"].cumsum().fillna(0, downcast="infer")

输出:

print(df)

         Date  Confirmed  Total Confirmed
0  2022-01-01          7                0
1  2022-01-02          4                0
2  2022-01-03         12               12
3  2022-01-03          2               14
4  2022-01-04          9               23
5  2022-01-05         10               33
英文:

You can try :

ser = df["Confirmed"].where(df["Date"].eq("2022-01-03").cummax(), 0)

df["Total Confirmed"] = ser.cumsum()

Another variant :

df.iloc[:df["Date"].eq("2022-01-03").idxmax()] = np.nan

df["Total Confirmed"] = df["Confirmed"].cumsum().fillna(0, downcast="infer")

Output :

print(df)

         Date  Confirmed  Total Confirmed
0  2022-01-01          7                0
1  2022-01-02          4                0
2  2022-01-03         12               12
3  2022-01-03          2               14
4  2022-01-04          9               23
5  2022-01-05         10               33

答案2

得分: 1

"(df["Confirmed"] * (df["Date"] >= "2022-01-03")).cumsum()" 可以翻译为:"(df["Confirmed"] * (df["Date"] >= "2022-01-03")).cumsum()"。

英文:

(df["Confirmed"] * (df["Date"] >= "2022-01-03")).cumsum()

huangapple
  • 本文由 发表于 2023年6月9日 05:42:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76435873.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定