2023年3月21日 00:28:55go评论106阅读模式

英文:

Calculate the carry over from rows based on criteria in pandas

问题

以下是翻译好的部分：

我有一个像这样的 `df`：
```python
date	    time	value
2021-08	    0.0 	22.50
2021-08	    5.0 	6600.00
2021-09	    0.0 	1057.62
2021-09	    1.0 	646.35
2021-09	    2.0 	311.76
2021-09	    3.0 	3982.50
2021-09	    4.0 	900.00
2021-09	    7.0 	546.00
2021-09	    9.0 	1471.50
2021-09	    11.0    1535.16

time 列表示从 date 开始支付 value 的月份数。例如，第一行保持不变，第二行保持不变，因为没有要添加的内容，但是第三行将是 value + 6600，因为从第二行开始，从 2021-08 到 2022-02 支付 6600。

我不确定如何实现这一点，我的想法是创建一个新的数据框：

new_df = pd.DataFrame(pd.date_range(start='2021-08', end=datetime.datetime.now(), freq='M'), columns=['value'])
new_df['commission'] = 0

并通过迭代主要的 df 来填充它，以便最终结果看起来像这样：

leased	value
2021-08	22.50 + 6600
2021-09	6600 + 1057.62 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-10	6600 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-11	6600 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
...


<details>
<summary>英文:</summary>
I have a `df` like this :

date time value
2021-08 0.0 22.50
2021-08 5.0 6600.00
2021-09 0.0 1057.62
2021-09 1.0 646.35
2021-09 2.0 311.76
2021-09 3.0 3982.50
2021-09 4.0 900.00
2021-09 7.0 546.00
2021-09 9.0 1471.50
2021-09 11.0 1535.16


The `time` column represent for how many months the `value` is being payed from the start of `date`. So for example the first row remains the same, the second row remains the same as there is nothing to add, but the third row would be the `value` + `6600` because from the second row, the value of `6600` is being payed from `2021-08` to `2022-02`
I am unsure how can I achieve this, my idea was to create a new data frame:

new_df = pd.DataFrame(pd.date_range(start='2021-08', end=datetime.datetime.now(), freq='M'), columns=['value'])
new_df['commission'] = 0


And fill it somehow while iterating through the main `df` so that the end result should look like this:

leased value
2021-08 22.50 + 6600
2021-09 6600 + 1057.62 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-10 6600 + 646.35 + 311.76 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
2021-11 6600 + 3982.50 + 900.00 + 546.00 + 1471.50 + 1535.16
...


</details>
# 答案1
**得分**: 2
以下是您要翻译的内容：
您可以使用一个月的周期，[`repeat`](https://pandas.pydata.org/docs/reference/api/pandas.Index.repeat.html)行，使用 [`groupby.cumcount`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.cumcount.html) 增加周期，以及 [`groupby.agg`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html) 进行聚合：
```python
# 生成周期
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
# 重复值，进行聚合
out = (
 df.loc[df.index.repeat(df['time'].add(1))]
   .assign(date=lambda d: d.groupby(level=0).cumcount().add(d['date']))
   .groupby('date', as_index=False)['value'].sum()
)

输出：

       date     value
0   2021-08   6622.50
1   2021-09  17050.89
2   2021-10  15993.27
3   2021-11  15346.92
4   2021-12  15035.16
5   2022-01  11052.66
6   2022-02   3552.66
7   2022-03   3552.66
8   2022-04   3552.66
9   2022-05   3006.66
10  2022-06   3006.66
11  2022-07   1535.16
12  2022-08   1535.16

英文:

You can use a monthly period, repeat the rows, increment the periods with groupby.cumcount, and groupby.agg:

# make period
df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;]).dt.to_period(&#39;M&#39;)
# repeat the values, aggregate
out = (
 df.loc[df.index.repeat(df[&#39;time&#39;].add(1))]
   .assign(date=lambda d: d.groupby(level=0).cumcount().add(d[&#39;date&#39;]))
   .groupby(&#39;date&#39;, as_index=False)[&#39;value&#39;].sum()
)

Output:

       date     value
0   2021-08   6622.50
1   2021-09  17050.89
2   2021-10  15993.27
3   2021-11  15346.92
4   2021-12  15035.16
5   2022-01  11052.66
6   2022-02   3552.66
7   2022-03   3552.66
8   2022-04   3552.66
9   2022-05   3006.66
10  2022-06   3006.66
11  2022-07   1535.16
12  2022-08   1535.16

答案2

得分: 0

以下是翻译好的代码部分：

这里是另一种选项：
这里我们使用 `map()` 和 `np.arange()` 来创建一个我们将使用 `explode()` 的列表。然后我们将月份添加到每个日期并使用 `groupby()` 按日期分组。
df['date'] = pd.to_datetime(df['date'])
(df.assign(time = df['time'].add(1).map(lambda x: np.arange(1, x+1)))
.explode('time').assign(date = lambda x: x.apply(lambda r: r['date']+pd.offsets.DateOffset(months = r['time']-1), axis=1))
.groupby('date')['value'].sum())
输出：
日期
2021-08-01     6622.50
2021-09-01    17050.89
2021-10-01    15993.27
2021-11-01    15346.92
2021-12-01    15035.16
2022-01-01    11052.66
2022-02-01     3552.66
2022-03-01     3552.66
2022-04-01     3552.66
2022-05-01     3006.66
2022-06-01     3006.66
2022-07-01     1535.16
2022-08-01     1535.16

英文:

Here is another option:

Here we use map() and np.arange() to create a list that we will explode(). We then add the months to each date and groupby() date

df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;])
(df.assign(time = df[&#39;time&#39;].add(1).map(lambda x: np.arange(1,x+1)))
.explode(&#39;time&#39;).assign(date = lambda x: x.apply(lambda r: r[&#39;date&#39;]+pd.offsets.DateOffset(months = r[&#39;time&#39;]-1),axis=1))
.groupby(&#39;date&#39;)[&#39;value&#39;].sum())

Output:

date
2021-08-01     6622.50
2021-09-01    17050.89
2021-10-01    15993.27
2021-11-01    15346.92
2021-12-01    15035.16
2022-01-01    11052.66
2022-02-01     3552.66
2022-03-01     3552.66
2022-04-01     3552.66
2022-05-01     3006.66
2022-06-01     3006.66
2022-07-01     1535.16
2022-08-01     1535.16

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据 pandas 中的条件计算行的进位。

问题

答案2

How to set a font and size to save information in pandas?

Python 打印不重复的随机行

在Pandas中对数据转换进行矢量化或加速循环

如何使用AWS SDK更改EC2启动模板中的实例类型？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。