根据 pandas 中的条件计算行的进位。

huangapple go评论66阅读模式
英文:

Calculate the carry over from rows based on criteria in pandas

问题

以下是翻译好的部分:

我有一个像这样的 `df`:

```python
date	    time	value
2021-08	    0.0 	22.50
2021-08	    5.0 	6600.00
2021-09	    0.0 	1057.62
2021-09	    1.0 	646.35
2021-09	    2.0 	311.76
2021-09	    3.0 	3982.50
2021-09	    4.0 	900.00
2021-09	    7.0 	546.00
2021-09	    9.0 	1471.50
2021-09	    11.0    1535.16

time 列表示从 date 开始支付 value 的月份数。例如,第一行保持不变,第二行保持不变,因为没有要添加的内容,但是第三行将是 value + 6600,因为从第二行开始,从 2021-082022-02 支付 6600

我不确定如何实现这一点,我的想法是创建一个新的数据框:

new_df = pd.DataFrame(pd.date_range(start='2021-08', end=datetime.datetime.now(), freq='M'), columns=['value'])
new_df['commission'] = 0

并通过迭代主要的 df 来填充它,以便最终结果看起来像这样:

leased	value
2021-08	22.50 + 6600
2021-09	6600 + 1057.62 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-10	6600 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-11	6600 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
...

<details>
<summary>英文:</summary>

I have a `df` like this :

date time value
2021-08 0.0 22.50
2021-08 5.0 6600.00
2021-09 0.0 1057.62
2021-09 1.0 646.35
2021-09 2.0 311.76
2021-09 3.0 3982.50
2021-09 4.0 900.00
2021-09 7.0 546.00
2021-09 9.0 1471.50
2021-09 11.0 1535.16


The `time` column represent for how many months the `value` is being payed from the start of `date`. So for example the first row remains the same, the second row remains the same as there is nothing to add, but the third row would be the `value` + `6600` because from the second row, the value of `6600` is being payed from `2021-08` to `2022-02`


I am unsure how can I achieve this, my idea was to create a new data frame:

new_df = pd.DataFrame(pd.date_range(start='2021-08', end=datetime.datetime.now(), freq='M'), columns=['value'])
new_df['commission'] = 0


And fill it somehow while iterating through the main `df` so that the end result should look like this:

leased value
2021-08 22.50 + 6600
2021-09 6600 + 1057.62 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-10 6600 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-11 6600 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
...


</details>


# 答案1
**得分**: 2

以下是您要翻译的内容:

您可以使用一个月的周期,[`repeat`](https://pandas.pydata.org/docs/reference/api/pandas.Index.repeat.html)行,使用 [`groupby.cumcount`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.cumcount.html) 增加周期,以及 [`groupby.agg`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html) 进行聚合:

```python
# 生成周期
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')

# 重复值,进行聚合
out = (
 df.loc[df.index.repeat(df['time'].add(1))]
   .assign(date=lambda d: d.groupby(level=0).cumcount().add(d['date']))
   .groupby('date', as_index=False)['value'].sum()
)

输出:

       date     value
0   2021-08   6622.50
1   2021-09  17050.89
2   2021-10  15993.27
3   2021-11  15346.92
4   2021-12  15035.16
5   2022-01  11052.66
6   2022-02   3552.66
7   2022-03   3552.66
8   2022-04   3552.66
9   2022-05   3006.66
10  2022-06   3006.66
11  2022-07   1535.16
12  2022-08   1535.16
英文:

You can use a monthly period, repeat the rows, increment the periods with groupby.cumcount, and groupby.agg:

# make period
df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;]).dt.to_period(&#39;M&#39;)

# repeat the values, aggregate
out = (
 df.loc[df.index.repeat(df[&#39;time&#39;].add(1))]
   .assign(date=lambda d: d.groupby(level=0).cumcount().add(d[&#39;date&#39;]))
   .groupby(&#39;date&#39;, as_index=False)[&#39;value&#39;].sum()
)

Output:

       date     value
0   2021-08   6622.50
1   2021-09  17050.89
2   2021-10  15993.27
3   2021-11  15346.92
4   2021-12  15035.16
5   2022-01  11052.66
6   2022-02   3552.66
7   2022-03   3552.66
8   2022-04   3552.66
9   2022-05   3006.66
10  2022-06   3006.66
11  2022-07   1535.16
12  2022-08   1535.16

答案2

得分: 0

以下是翻译好的代码部分:

这里是另一种选项

这里我们使用 `map()``np.arange()` 来创建一个我们将使用 `explode()` 的列表然后我们将月份添加到每个日期并使用 `groupby()` 按日期分组

df['date'] = pd.to_datetime(df['date'])
(df.assign(time = df['time'].add(1).map(lambda x: np.arange(1, x+1)))
.explode('time').assign(date = lambda x: x.apply(lambda r: r['date']+pd.offsets.DateOffset(months = r['time']-1), axis=1))
.groupby('date')['value'].sum())

输出

日期
2021-08-01     6622.50
2021-09-01    17050.89
2021-10-01    15993.27
2021-11-01    15346.92
2021-12-01    15035.16
2022-01-01    11052.66
2022-02-01     3552.66
2022-03-01     3552.66
2022-04-01     3552.66
2022-05-01     3006.66
2022-06-01     3006.66
2022-07-01     1535.16
2022-08-01     1535.16
英文:

Here is another option:

Here we use map() and np.arange() to create a list that we will explode(). We then add the months to each date and groupby() date

df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;])
(df.assign(time = df[&#39;time&#39;].add(1).map(lambda x: np.arange(1,x+1)))
.explode(&#39;time&#39;).assign(date = lambda x: x.apply(lambda r: r[&#39;date&#39;]+pd.offsets.DateOffset(months = r[&#39;time&#39;]-1),axis=1))
.groupby(&#39;date&#39;)[&#39;value&#39;].sum())

Output:

date
2021-08-01     6622.50
2021-09-01    17050.89
2021-10-01    15993.27
2021-11-01    15346.92
2021-12-01    15035.16
2022-01-01    11052.66
2022-02-01     3552.66
2022-03-01     3552.66
2022-04-01     3552.66
2022-05-01     3006.66
2022-06-01     3006.66
2022-07-01     1535.16
2022-08-01     1535.16

huangapple
  • 本文由 发表于 2023年3月21日 00:28:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75792901.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定