英文:
Is it possible to fill the empty cells without using a for loop?
问题
我有一个包含银行数据的数据框示例。我想知道是否有可能在不使用for循环的情况下填充空单元格。
在这个示例中,假设在第2行(Pythonic方式),它应该取前一行的余额值52867.36
并添加第2行的金额:847.00
。
当同一日期发生多笔交易时会出现这种情况。
使用for循环很容易,但我想知道是否有一种通过矢量化来实现的方法。
数据框
import pandas as pd
l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
l2 = [54267.36, 52867.36, '', '', '', 52744.21, '', 52646.91, '', 34898.36, 34871.46, 51026.46, '', '', 50612.36, 61468.52, '', 69563.27]
l3 = [-390, -1400, 847, -900.15, -45, -25, -57.3, -40, -12528.55, -5220, -26.9, 16155, -275, -105, -34.1, 10856.16, 7663.95, 430.8]
df = pd.DataFrame(list(zip(l1, l2, l3)), columns=['Date', 'Balance', 'Amount'])
print(df)
Date Balance Amount
0 26.10.2022 54267.36 -390.00
1 27.10.2022 52867.36 -1400.00
2 28.10.2022 847.00
3 28.10.2022 -900.15
4 28.10.2022 -45.00
5 28.10.2022 52744.21 -25.00
6 31.10.2022 -57.30
7 31.10.2022 52646.91 -40.00
8 01.11.2022 -12528.55
9 01.11.2022 34898.36 -5220.00
10 03.11.2022 34871.46 -26.90
11 04.11.2022 51026.46 16155.00
12 07.11.2022 -275.00
13 07.11.2022 -105.00
14 07.11.2022 50612.36 -34.10
15 08.11.2022 61468.52 10856.16
16 09.11.2022 7663.95
17 09.11.2022 69563.27 430.80
英文:
I have a sample a of dataframe with banking data. I would like to know if it is possible to fill the empty cells without using a for loop.
In this example, let's say that at the row number 2 (pythonic way), it should take the value of the balance at the previous row 52867,36
and add the amount of the row number 2 : 847.00
.
This happens when there are several transactions on the same date.
It is easy with a for loop but I would like to know if there is a way to do it by a vectorisation.
The dataframe
import pandas as pd
l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
l2 = [54267.36,52867.36, '','' , '',52744.21,'' ,52646.91,'',34898.36,34871.46,51026.46,'','',50612.36,61468.52,'',69563.27]
l3 = [-390,-1400,847,-900.15,-45,-25,-57.3,-40,-12528.55,-5220,-26.9,16155,-275,-105,-34.1,10856.16,7663.95,430.8]
df = pd.DataFrame(list(zip(l1,l2,l3)), columns = ['Date','Balance','Amount'])
print(df)
Date Balance Amount
0 26.10.2022 54267.36 -390.00
1 27.10.2022 52867.36 -1400.00
2 28.10.2022 847.00
3 28.10.2022 -900.15
4 28.10.2022 -45.00
5 28.10.2022 52744.21 -25.00
6 31.10.2022 -57.30
7 31.10.2022 52646.91 -40.00
8 01.11.2022 -12528.55
9 01.11.2022 34898.36 -5220.00
10 03.11.2022 34871.46 -26.90
11 04.11.2022 51026.46 16155.00
12 07.11.2022 -275.00
13 07.11.2022 -105.00
14 07.11.2022 50612.36 -34.10
15 08.11.2022 61468.52 10856.16
16 09.11.2022 7663.95
17 09.11.2022 69563.27 430.80
答案1
得分: 2
df['Balance'] = (pd.to_numeric(df['Balance'])
.fillna(df['Amount'].shift(-1).cumsum().add(df.iloc[0]['Balance']).shift(1)))
英文:
You can cumsum
on Amount
column to get difference to the first value of Balance
then fillna
value in Balance
column
df['Balance'] = (pd.to_numeric(df['Balance'])
.fillna(df['Amount'].shift(-1).cumsum().add(df.iloc[0]['Balance']).shift(1)))
print(df)
Date Balance Amount
0 26.10.2022 54267.36 -390.00
1 27.10.2022 52867.36 -1400.00
2 28.10.2022 53714.36 847.00
3 28.10.2022 52814.21 -900.15
4 28.10.2022 52769.21 -45.00
5 28.10.2022 52744.21 -25.00
6 31.10.2022 52686.91 -57.30
7 31.10.2022 52646.91 -40.00
8 01.11.2022 40118.36 -12528.55
9 01.11.2022 34898.36 -5220.00
10 03.11.2022 34871.46 -26.90
11 04.11.2022 51026.46 16155.00
12 07.11.2022 50751.46 -275.00
13 07.11.2022 50646.46 -105.00
14 07.11.2022 50612.36 -34.10
15 08.11.2022 61468.52 10856.16
16 09.11.2022 69132.47 7663.95
17 09.11.2022 69563.27 430.80
答案2
得分: 0
这是代码的翻译结果:
import pandas as pd
from itertools import accumulate
l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
l2 = [54267.36, 52867.36, '', '', '', 52744.21, '', 52646.91, '', 34898.36, 34871.46, 51026.46, '', '', 50612.36, 61468.52, '', 69563.27]
l3 = [-390, -1400, 847, -900.15, -45, -25, -57.3, -40, -12528.55, -5220, -26.9, 16155, -275, -105, -34.1, 10856.16, 7663.95, 430.8]
df = pd.DataFrame(list(zip(l1, l2, l3)), columns=['Date', 'Balance', 'Amount'])
df["Balance"] = df["Balance"].apply(lambda x: None if x == '' else x).astype(float)
df["Balance"] = [df.loc[0, "Balance"]] + list(accumulate(df.loc[2:, "Amount"], initial=df.loc[1, 'Balance']))
print(df)
这是输出结果:
Date Balance Amount
0 26.10.2022 54267.36 -390.00
1 27.10.2022 52867.36 -1400.00
2 28.10.2022 53714.36 847.00
3 28.10.2022 52814.21 -900.15
4 28.10.2022 52769.21 -45.00
5 28.10.2022 52744.21 -25.00
6 31.10.2022 52686.91 -57.30
7 31.10.2022 52646.91 -40.00
8 01.11.2022 40118.36 -12528.55
9 01.11.2022 34898.36 -5220.00
10 03.11.2022 34871.46 -26.90
11 04.11.2022 51026.46 16155.00
12 07.11.2022 50751.46 -275.00
13 07.11.2022 50646.46 -105.00
14 07.11.2022 50612.36 -34.10
15 08.11.2022 61468.52 10856.16
16 09.11.2022 69132.47 7663.95
17 09.11.2022 69563.27 430.80
英文:
I think you should go with the pandas solution @Ynjxsjmh posted above, but I went for the stdlib's itertools.
import pandas as pd
from itertools import accumulate
l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
l2 = [54267.36,52867.36, '','' , '',52744.21,'' ,52646.91,'',34898.36,34871.46,51026.46,'','',50612.36,61468.52,'',69563.27]
l3 = [-390,-1400,847,-900.15,-45,-25,-57.3,-40,-12528.55,-5220,-26.9,16155,-275,-105,-34.1,10856.16,7663.95,430.8]
df = pd.DataFrame(list(zip(l1,l2,l3)), columns = ['Date','Balance','Amount'])
df["Balance"] = df["Balance"].apply(lambda x: None if x == '' else x).astype(float)
df["Balance"] = [df.loc[0, "Balance"]] + list(accumulate(df.loc[2:, "Amount"], initial=df.loc[1, 'Balance']))
print(df)
This gives:
Date Balance Amount
0 26.10.2022 54267.36 -390.00
1 27.10.2022 52867.36 -1400.00
2 28.10.2022 53714.36 847.00
3 28.10.2022 52814.21 -900.15
4 28.10.2022 52769.21 -45.00
5 28.10.2022 52744.21 -25.00
6 31.10.2022 52686.91 -57.30
7 31.10.2022 52646.91 -40.00
8 01.11.2022 40118.36 -12528.55
9 01.11.2022 34898.36 -5220.00
10 03.11.2022 34871.46 -26.90
11 04.11.2022 51026.46 16155.00
12 07.11.2022 50751.46 -275.00
13 07.11.2022 50646.46 -105.00
14 07.11.2022 50612.36 -34.10
15 08.11.2022 61468.52 10856.16
16 09.11.2022 69132.47 7663.95
17 09.11.2022 69563.27 430.80
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论