英文:
Calculate the carry over from rows based on criteria in pandas
问题
以下是翻译好的部分:
我有一个像这样的 `df`:
```python
date time value
2021-08 0.0 22.50
2021-08 5.0 6600.00
2021-09 0.0 1057.62
2021-09 1.0 646.35
2021-09 2.0 311.76
2021-09 3.0 3982.50
2021-09 4.0 900.00
2021-09 7.0 546.00
2021-09 9.0 1471.50
2021-09 11.0 1535.16
time
列表示从 date
开始支付 value
的月份数。例如,第一行保持不变,第二行保持不变,因为没有要添加的内容,但是第三行将是 value
+ 6600
,因为从第二行开始,从 2021-08
到 2022-02
支付 6600
。
我不确定如何实现这一点,我的想法是创建一个新的数据框:
new_df = pd.DataFrame(pd.date_range(start='2021-08', end=datetime.datetime.now(), freq='M'), columns=['value'])
new_df['commission'] = 0
并通过迭代主要的 df
来填充它,以便最终结果看起来像这样:
leased value
2021-08 22.50 + 6600
2021-09 6600 + 1057.62 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-10 6600 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-11 6600 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
...
<details>
<summary>英文:</summary>
I have a `df` like this :
date time value
2021-08 0.0 22.50
2021-08 5.0 6600.00
2021-09 0.0 1057.62
2021-09 1.0 646.35
2021-09 2.0 311.76
2021-09 3.0 3982.50
2021-09 4.0 900.00
2021-09 7.0 546.00
2021-09 9.0 1471.50
2021-09 11.0 1535.16
The `time` column represent for how many months the `value` is being payed from the start of `date`. So for example the first row remains the same, the second row remains the same as there is nothing to add, but the third row would be the `value` + `6600` because from the second row, the value of `6600` is being payed from `2021-08` to `2022-02`
I am unsure how can I achieve this, my idea was to create a new data frame:
new_df = pd.DataFrame(pd.date_range(start='2021-08', end=datetime.datetime.now(), freq='M'), columns=['value'])
new_df['commission'] = 0
And fill it somehow while iterating through the main `df` so that the end result should look like this:
leased value
2021-08 22.50 + 6600
2021-09 6600 + 1057.62 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-10 6600 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-11 6600 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
...
</details>
# 答案1
**得分**: 2
以下是您要翻译的内容:
您可以使用一个月的周期,[`repeat`](https://pandas.pydata.org/docs/reference/api/pandas.Index.repeat.html)行,使用 [`groupby.cumcount`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.cumcount.html) 增加周期,以及 [`groupby.agg`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html) 进行聚合:
```python
# 生成周期
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
# 重复值,进行聚合
out = (
df.loc[df.index.repeat(df['time'].add(1))]
.assign(date=lambda d: d.groupby(level=0).cumcount().add(d['date']))
.groupby('date', as_index=False)['value'].sum()
)
输出:
date value
0 2021-08 6622.50
1 2021-09 17050.89
2 2021-10 15993.27
3 2021-11 15346.92
4 2021-12 15035.16
5 2022-01 11052.66
6 2022-02 3552.66
7 2022-03 3552.66
8 2022-04 3552.66
9 2022-05 3006.66
10 2022-06 3006.66
11 2022-07 1535.16
12 2022-08 1535.16
英文:
You can use a monthly period, repeat
the rows, increment the periods with groupby.cumcount
, and groupby.agg
:
# make period
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
# repeat the values, aggregate
out = (
df.loc[df.index.repeat(df['time'].add(1))]
.assign(date=lambda d: d.groupby(level=0).cumcount().add(d['date']))
.groupby('date', as_index=False)['value'].sum()
)
Output:
date value
0 2021-08 6622.50
1 2021-09 17050.89
2 2021-10 15993.27
3 2021-11 15346.92
4 2021-12 15035.16
5 2022-01 11052.66
6 2022-02 3552.66
7 2022-03 3552.66
8 2022-04 3552.66
9 2022-05 3006.66
10 2022-06 3006.66
11 2022-07 1535.16
12 2022-08 1535.16
答案2
得分: 0
以下是翻译好的代码部分:
这里是另一种选项:
这里我们使用 `map()` 和 `np.arange()` 来创建一个我们将使用 `explode()` 的列表。然后我们将月份添加到每个日期并使用 `groupby()` 按日期分组。
df['date'] = pd.to_datetime(df['date'])
(df.assign(time = df['time'].add(1).map(lambda x: np.arange(1, x+1)))
.explode('time').assign(date = lambda x: x.apply(lambda r: r['date']+pd.offsets.DateOffset(months = r['time']-1), axis=1))
.groupby('date')['value'].sum())
输出:
日期
2021-08-01 6622.50
2021-09-01 17050.89
2021-10-01 15993.27
2021-11-01 15346.92
2021-12-01 15035.16
2022-01-01 11052.66
2022-02-01 3552.66
2022-03-01 3552.66
2022-04-01 3552.66
2022-05-01 3006.66
2022-06-01 3006.66
2022-07-01 1535.16
2022-08-01 1535.16
英文:
Here is another option:
Here we use map()
and np.arange()
to create a list that we will explode()
. We then add the months to each date and groupby()
date
df['date'] = pd.to_datetime(df['date'])
(df.assign(time = df['time'].add(1).map(lambda x: np.arange(1,x+1)))
.explode('time').assign(date = lambda x: x.apply(lambda r: r['date']+pd.offsets.DateOffset(months = r['time']-1),axis=1))
.groupby('date')['value'].sum())
Output:
date
2021-08-01 6622.50
2021-09-01 17050.89
2021-10-01 15993.27
2021-11-01 15346.92
2021-12-01 15035.16
2022-01-01 11052.66
2022-02-01 3552.66
2022-03-01 3552.66
2022-04-01 3552.66
2022-05-01 3006.66
2022-06-01 3006.66
2022-07-01 1535.16
2022-08-01 1535.16
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论