根据 pandas 中的条件计算行的进位。

huangapple go评论106阅读模式
英文:

Calculate the carry over from rows based on criteria in pandas

问题

以下是翻译好的部分:

  1. 我有一个像这样的 `df`
  2. ```python
  3. date time value
  4. 2021-08 0.0 22.50
  5. 2021-08 5.0 6600.00
  6. 2021-09 0.0 1057.62
  7. 2021-09 1.0 646.35
  8. 2021-09 2.0 311.76
  9. 2021-09 3.0 3982.50
  10. 2021-09 4.0 900.00
  11. 2021-09 7.0 546.00
  12. 2021-09 9.0 1471.50
  13. 2021-09 11.0 1535.16

time 列表示从 date 开始支付 value 的月份数。例如,第一行保持不变,第二行保持不变,因为没有要添加的内容,但是第三行将是 value + 6600,因为从第二行开始,从 2021-082022-02 支付 6600

我不确定如何实现这一点,我的想法是创建一个新的数据框:

  1. new_df = pd.DataFrame(pd.date_range(start='2021-08', end=datetime.datetime.now(), freq='M'), columns=['value'])
  2. new_df['commission'] = 0

并通过迭代主要的 df 来填充它,以便最终结果看起来像这样:

  1. leased value
  2. 2021-08 22.50 + 6600
  3. 2021-09 6600 + 1057.62 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
  4. 2021-10 6600 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
  5. 2021-11 6600 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
  6. ...
  1. <details>
  2. <summary>英文:</summary>
  3. I have a `df` like this :

date time value
2021-08 0.0 22.50
2021-08 5.0 6600.00
2021-09 0.0 1057.62
2021-09 1.0 646.35
2021-09 2.0 311.76
2021-09 3.0 3982.50
2021-09 4.0 900.00
2021-09 7.0 546.00
2021-09 9.0 1471.50
2021-09 11.0 1535.16

  1. The `time` column represent for how many months the `value` is being payed from the start of `date`. So for example the first row remains the same, the second row remains the same as there is nothing to add, but the third row would be the `value` + `6600` because from the second row, the value of `6600` is being payed from `2021-08` to `2022-02`
  2. I am unsure how can I achieve this, my idea was to create a new data frame:

new_df = pd.DataFrame(pd.date_range(start='2021-08', end=datetime.datetime.now(), freq='M'), columns=['value'])
new_df['commission'] = 0

  1. And fill it somehow while iterating through the main `df` so that the end result should look like this:

leased value
2021-08 22.50 + 6600
2021-09 6600 + 1057.62 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-10 6600 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-11 6600 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
...

  1. </details>
  2. # 答案1
  3. **得分**: 2
  4. 以下是您要翻译的内容:
  5. 您可以使用一个月的周期,[`repeat`](https://pandas.pydata.org/docs/reference/api/pandas.Index.repeat.html)行,使用 [`groupby.cumcount`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.cumcount.html) 增加周期,以及 [`groupby.agg`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html) 进行聚合:
  6. ```python
  7. # 生成周期
  8. df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
  9. # 重复值,进行聚合
  10. out = (
  11. df.loc[df.index.repeat(df['time'].add(1))]
  12. .assign(date=lambda d: d.groupby(level=0).cumcount().add(d['date']))
  13. .groupby('date', as_index=False)['value'].sum()
  14. )

输出:

  1. date value
  2. 0 2021-08 6622.50
  3. 1 2021-09 17050.89
  4. 2 2021-10 15993.27
  5. 3 2021-11 15346.92
  6. 4 2021-12 15035.16
  7. 5 2022-01 11052.66
  8. 6 2022-02 3552.66
  9. 7 2022-03 3552.66
  10. 8 2022-04 3552.66
  11. 9 2022-05 3006.66
  12. 10 2022-06 3006.66
  13. 11 2022-07 1535.16
  14. 12 2022-08 1535.16
英文:

You can use a monthly period, repeat the rows, increment the periods with groupby.cumcount, and groupby.agg:

  1. # make period
  2. df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;]).dt.to_period(&#39;M&#39;)
  3. # repeat the values, aggregate
  4. out = (
  5. df.loc[df.index.repeat(df[&#39;time&#39;].add(1))]
  6. .assign(date=lambda d: d.groupby(level=0).cumcount().add(d[&#39;date&#39;]))
  7. .groupby(&#39;date&#39;, as_index=False)[&#39;value&#39;].sum()
  8. )

Output:

  1. date value
  2. 0 2021-08 6622.50
  3. 1 2021-09 17050.89
  4. 2 2021-10 15993.27
  5. 3 2021-11 15346.92
  6. 4 2021-12 15035.16
  7. 5 2022-01 11052.66
  8. 6 2022-02 3552.66
  9. 7 2022-03 3552.66
  10. 8 2022-04 3552.66
  11. 9 2022-05 3006.66
  12. 10 2022-06 3006.66
  13. 11 2022-07 1535.16
  14. 12 2022-08 1535.16

答案2

得分: 0

以下是翻译好的代码部分:

  1. 这里是另一种选项
  2. 这里我们使用 `map()` `np.arange()` 来创建一个我们将使用 `explode()` 的列表然后我们将月份添加到每个日期并使用 `groupby()` 按日期分组
  3. df['date'] = pd.to_datetime(df['date'])
  4. (df.assign(time = df['time'].add(1).map(lambda x: np.arange(1, x+1)))
  5. .explode('time').assign(date = lambda x: x.apply(lambda r: r['date']+pd.offsets.DateOffset(months = r['time']-1), axis=1))
  6. .groupby('date')['value'].sum())
  7. 输出
  8. 日期
  9. 2021-08-01 6622.50
  10. 2021-09-01 17050.89
  11. 2021-10-01 15993.27
  12. 2021-11-01 15346.92
  13. 2021-12-01 15035.16
  14. 2022-01-01 11052.66
  15. 2022-02-01 3552.66
  16. 2022-03-01 3552.66
  17. 2022-04-01 3552.66
  18. 2022-05-01 3006.66
  19. 2022-06-01 3006.66
  20. 2022-07-01 1535.16
  21. 2022-08-01 1535.16
英文:

Here is another option:

Here we use map() and np.arange() to create a list that we will explode(). We then add the months to each date and groupby() date

  1. df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;])
  2. (df.assign(time = df[&#39;time&#39;].add(1).map(lambda x: np.arange(1,x+1)))
  3. .explode(&#39;time&#39;).assign(date = lambda x: x.apply(lambda r: r[&#39;date&#39;]+pd.offsets.DateOffset(months = r[&#39;time&#39;]-1),axis=1))
  4. .groupby(&#39;date&#39;)[&#39;value&#39;].sum())

Output:

  1. date
  2. 2021-08-01 6622.50
  3. 2021-09-01 17050.89
  4. 2021-10-01 15993.27
  5. 2021-11-01 15346.92
  6. 2021-12-01 15035.16
  7. 2022-01-01 11052.66
  8. 2022-02-01 3552.66
  9. 2022-03-01 3552.66
  10. 2022-04-01 3552.66
  11. 2022-05-01 3006.66
  12. 2022-06-01 3006.66
  13. 2022-07-01 1535.16
  14. 2022-08-01 1535.16

huangapple
  • 本文由 发表于 2023年3月21日 00:28:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75792901.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定