2023年2月7日 00:38:32go评论90阅读模式

英文:

Is it possible to fill the empty cells without using a for loop?

问题

我有一个包含银行数据的数据框示例。我想知道是否有可能在不使用for循环的情况下填充空单元格。

在这个示例中，假设在第2行（Pythonic方式），它应该取前一行的余额值52867.36并添加第2行的金额：847.00。

当同一日期发生多笔交易时会出现这种情况。

使用for循环很容易，但我想知道是否有一种通过矢量化来实现的方法。

数据框

import pandas as pd
l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
l2 = [54267.36, 52867.36, '', '', '', 52744.21, '', 52646.91, '', 34898.36, 34871.46, 51026.46, '', '', 50612.36, 61468.52, '', 69563.27]
l3 = [-390, -1400, 847, -900.15, -45, -25, -57.3, -40, -12528.55, -5220, -26.9, 16155, -275, -105, -34.1, 10856.16, 7663.95, 430.8]
df = pd.DataFrame(list(zip(l1, l2, l3)), columns=['Date', 'Balance', 'Amount'])
print(df)

         Date   Balance   Amount
0   26.10.2022  54267.36  -390.00
1   27.10.2022  52867.36 -1400.00
2   28.10.2022            847.00
3   28.10.2022          -900.15
4   28.10.2022            -45.00
5   28.10.2022  52744.21   -25.00
6   31.10.2022          -57.30
7   31.10.2022  52646.91   -40.00
8   01.11.2022         -12528.55
9   01.11.2022  34898.36 -5220.00
10  03.11.2022  34871.46   -26.90
11  04.11.2022  51026.46  16155.00
12  07.11.2022         -275.00
13  07.11.2022         -105.00
14  07.11.2022  50612.36   -34.10
15  08.11.2022  61468.52  10856.16
16  09.11.2022         7663.95
17  09.11.2022  69563.27   430.80

英文:

I have a sample a of dataframe with banking data. I would like to know if it is possible to fill the empty cells without using a for loop.

In this example, let's say that at the row number 2 (pythonic way), it should take the value of the balance at the previous row 52867,36 and add the amount of the row number 2 : 847.00.

This happens when there are several transactions on the same date.

It is easy with a for loop but I would like to know if there is a way to do it by a vectorisation.

The dataframe

import pandas as pd
l1 = [&#39;26.10.2022&#39;, &#39;27.10.2022&#39;, &#39;28.10.2022&#39;, &#39;28.10.2022&#39;, &#39;28.10.2022&#39;,&#39;28.10.2022&#39;, &#39;31.10.2022&#39;, &#39;31.10.2022&#39;, &#39;01.11.2022&#39;, &#39;01.11.2022&#39;, &#39;03.11.2022&#39;,	&#39;04.11.2022&#39;, &#39;07.11.2022&#39;, &#39;07.11.2022&#39;, &#39;07.11.2022&#39;, &#39;08.11.2022&#39;, &#39;09.11.2022&#39;, &#39;09.11.2022&#39;]
l2 = [54267.36,52867.36, &#39;&#39;,&#39;&#39; , &#39;&#39;,52744.21,&#39;&#39; ,52646.91,&#39;&#39;,34898.36,34871.46,51026.46,&#39;&#39;,&#39;&#39;,50612.36,61468.52,&#39;&#39;,69563.27]
l3 = [-390,-1400,847,-900.15,-45,-25,-57.3,-40,-12528.55,-5220,-26.9,16155,-275,-105,-34.1,10856.16,7663.95,430.8]
df = pd.DataFrame(list(zip(l1,l2,l3)), columns = [&#39;Date&#39;,&#39;Balance&#39;,&#39;Amount&#39;])
print(df)
          Date   Balance    Amount
0   26.10.2022  54267.36   -390.00
1   27.10.2022  52867.36  -1400.00
2   28.10.2022              847.00
3   28.10.2022             -900.15
4   28.10.2022              -45.00
5   28.10.2022  52744.21    -25.00
6   31.10.2022              -57.30
7   31.10.2022  52646.91    -40.00
8   01.11.2022           -12528.55
9   01.11.2022  34898.36  -5220.00
10  03.11.2022  34871.46    -26.90
11  04.11.2022  51026.46  16155.00
12  07.11.2022             -275.00
13  07.11.2022             -105.00
14  07.11.2022  50612.36    -34.10
15  08.11.2022  61468.52  10856.16
16  09.11.2022             7663.95
17  09.11.2022  69563.27    430.80

答案1

得分: 2

df['Balance'] = (pd.to_numeric(df['Balance'])
                 .fillna(df['Amount'].shift(-1).cumsum().add(df.iloc[0]['Balance']).shift(1)))

英文:

You can cumsum on Amount column to get difference to the first value of Balance then fillna value in Balance column

df[&#39;Balance&#39;] = (pd.to_numeric(df[&#39;Balance&#39;])
                 .fillna(df[&#39;Amount&#39;].shift(-1).cumsum().add(df.iloc[0][&#39;Balance&#39;]).shift(1)))

print(df)
          Date   Balance    Amount
0   26.10.2022  54267.36   -390.00
1   27.10.2022  52867.36  -1400.00
2   28.10.2022  53714.36    847.00
3   28.10.2022  52814.21   -900.15
4   28.10.2022  52769.21    -45.00
5   28.10.2022  52744.21    -25.00
6   31.10.2022  52686.91    -57.30
7   31.10.2022  52646.91    -40.00
8   01.11.2022  40118.36 -12528.55
9   01.11.2022  34898.36  -5220.00
10  03.11.2022  34871.46    -26.90
11  04.11.2022  51026.46  16155.00
12  07.11.2022  50751.46   -275.00
13  07.11.2022  50646.46   -105.00
14  07.11.2022  50612.36    -34.10
15  08.11.2022  61468.52  10856.16
16  09.11.2022  69132.47   7663.95
17  09.11.2022  69563.27    430.80

答案2

得分: 0

这是代码的翻译结果：

import pandas as pd
from itertools import accumulate
l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022',  '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
l2 = [54267.36, 52867.36, '', '', '', 52744.21, '', 52646.91, '', 34898.36, 34871.46, 51026.46, '', '', 50612.36, 61468.52, '', 69563.27]
l3 = [-390, -1400, 847, -900.15, -45, -25, -57.3, -40, -12528.55, -5220, -26.9, 16155, -275, -105, -34.1, 10856.16, 7663.95, 430.8]
df = pd.DataFrame(list(zip(l1, l2, l3)), columns=['Date', 'Balance', 'Amount'])
df["Balance"] = df["Balance"].apply(lambda x: None if x == '' else x).astype(float)
df["Balance"] = [df.loc[0, "Balance"]] + list(accumulate(df.loc[2:, "Amount"], initial=df.loc[1, 'Balance']))
print(df)

这是输出结果：

          Date   Balance    Amount
0   26.10.2022  54267.36   -390.00
1   27.10.2022  52867.36  -1400.00
2   28.10.2022  53714.36    847.00
3   28.10.2022  52814.21   -900.15
4   28.10.2022  52769.21    -45.00
5   28.10.2022  52744.21    -25.00
6   31.10.2022  52686.91    -57.30
7   31.10.2022  52646.91    -40.00
8   01.11.2022  40118.36 -12528.55
9   01.11.2022  34898.36  -5220.00
10  03.11.2022  34871.46    -26.90
11  04.11.2022  51026.46  16155.00
12  07.11.2022  50751.46   -275.00
13  07.11.2022  50646.46   -105.00
14  07.11.2022  50612.36    -34.10
15  08.11.2022  61468.52  10856.16
16  09.11.2022  69132.47   7663.95
17  09.11.2022  69563.27    430.80

英文:

I think you should go with the pandas solution @Ynjxsjmh posted above, but I went for the stdlib's itertools.

import pandas as pd
from itertools import accumulate
l1 = [&#39;26.10.2022&#39;, &#39;27.10.2022&#39;, &#39;28.10.2022&#39;, &#39;28.10.2022&#39;, &#39;28.10.2022&#39;,&#39;28.10.2022&#39;, &#39;31.10.2022&#39;, &#39;31.10.2022&#39;, &#39;01.11.2022&#39;, &#39;01.11.2022&#39;, &#39;03.11.2022&#39;,  &#39;04.11.2022&#39;, &#39;07.11.2022&#39;, &#39;07.11.2022&#39;, &#39;07.11.2022&#39;, &#39;08.11.2022&#39;, &#39;09.11.2022&#39;, &#39;09.11.2022&#39;]
l2 = [54267.36,52867.36, &#39;&#39;,&#39;&#39; , &#39;&#39;,52744.21,&#39;&#39; ,52646.91,&#39;&#39;,34898.36,34871.46,51026.46,&#39;&#39;,&#39;&#39;,50612.36,61468.52,&#39;&#39;,69563.27]
l3 = [-390,-1400,847,-900.15,-45,-25,-57.3,-40,-12528.55,-5220,-26.9,16155,-275,-105,-34.1,10856.16,7663.95,430.8]
df = pd.DataFrame(list(zip(l1,l2,l3)), columns = [&#39;Date&#39;,&#39;Balance&#39;,&#39;Amount&#39;])
df[&quot;Balance&quot;] = df[&quot;Balance&quot;].apply(lambda x: None if x == &#39;&#39; else x).astype(float)
df[&quot;Balance&quot;] = [df.loc[0, &quot;Balance&quot;]] + list(accumulate(df.loc[2:, &quot;Amount&quot;], initial=df.loc[1, &#39;Balance&#39;]))
print(df)

This gives:

          Date   Balance    Amount
0   26.10.2022  54267.36   -390.00
1   27.10.2022  52867.36  -1400.00
2   28.10.2022  53714.36    847.00
3   28.10.2022  52814.21   -900.15
4   28.10.2022  52769.21    -45.00
5   28.10.2022  52744.21    -25.00
6   31.10.2022  52686.91    -57.30
7   31.10.2022  52646.91    -40.00
8   01.11.2022  40118.36 -12528.55
9   01.11.2022  34898.36  -5220.00
10  03.11.2022  34871.46    -26.90
11  04.11.2022  51026.46  16155.00
12  07.11.2022  50751.46   -275.00
13  07.11.2022  50646.46   -105.00
14  07.11.2022  50612.36    -34.10
15  08.11.2022  61468.52  10856.16
16  09.11.2022  69132.47   7663.95
17  09.11.2022  69563.27    430.80

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

能否在不使用 for 循环的情况下填充空单元格？

问题

数据框

The dataframe

答案1

答案2

如何将Excel单元格的值读取为列表？

Polars read_excel 将日期转换为字符串

选择性终止Python多进程中的进程。

如何计算位于数字1之间的零的数量

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。