能否在不使用 for 循环的情况下填充空单元格?

huangapple go评论90阅读模式
英文:

Is it possible to fill the empty cells without using a for loop?

问题

我有一个包含银行数据的数据框示例。我想知道是否有可能在不使用for循环的情况下填充空单元格。

在这个示例中,假设在第2行(Pythonic方式),它应该取前一行的余额值52867.36并添加第2行的金额:847.00

当同一日期发生多笔交易时会出现这种情况。

使用for循环很容易,但我想知道是否有一种通过矢量化来实现的方法。

数据框
  1. import pandas as pd
  2. l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
  3. l2 = [54267.36, 52867.36, '', '', '', 52744.21, '', 52646.91, '', 34898.36, 34871.46, 51026.46, '', '', 50612.36, 61468.52, '', 69563.27]
  4. l3 = [-390, -1400, 847, -900.15, -45, -25, -57.3, -40, -12528.55, -5220, -26.9, 16155, -275, -105, -34.1, 10856.16, 7663.95, 430.8]
  5. df = pd.DataFrame(list(zip(l1, l2, l3)), columns=['Date', 'Balance', 'Amount'])
  6. print(df)
  1. Date Balance Amount
  2. 0 26.10.2022 54267.36 -390.00
  3. 1 27.10.2022 52867.36 -1400.00
  4. 2 28.10.2022 847.00
  5. 3 28.10.2022 -900.15
  6. 4 28.10.2022 -45.00
  7. 5 28.10.2022 52744.21 -25.00
  8. 6 31.10.2022 -57.30
  9. 7 31.10.2022 52646.91 -40.00
  10. 8 01.11.2022 -12528.55
  11. 9 01.11.2022 34898.36 -5220.00
  12. 10 03.11.2022 34871.46 -26.90
  13. 11 04.11.2022 51026.46 16155.00
  14. 12 07.11.2022 -275.00
  15. 13 07.11.2022 -105.00
  16. 14 07.11.2022 50612.36 -34.10
  17. 15 08.11.2022 61468.52 10856.16
  18. 16 09.11.2022 7663.95
  19. 17 09.11.2022 69563.27 430.80
英文:

I have a sample a of dataframe with banking data. I would like to know if it is possible to fill the empty cells without using a for loop.

In this example, let's say that at the row number 2 (pythonic way), it should take the value of the balance at the previous row 52867,36 and add the amount of the row number 2 : 847.00.

This happens when there are several transactions on the same date.

It is easy with a for loop but I would like to know if there is a way to do it by a vectorisation.

The dataframe
  1. import pandas as pd
  2. l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
  3. l2 = [54267.36,52867.36, '','' , '',52744.21,'' ,52646.91,'',34898.36,34871.46,51026.46,'','',50612.36,61468.52,'',69563.27]
  4. l3 = [-390,-1400,847,-900.15,-45,-25,-57.3,-40,-12528.55,-5220,-26.9,16155,-275,-105,-34.1,10856.16,7663.95,430.8]
  5. df = pd.DataFrame(list(zip(l1,l2,l3)), columns = ['Date','Balance','Amount'])
  6. print(df)
  7. Date Balance Amount
  8. 0 26.10.2022 54267.36 -390.00
  9. 1 27.10.2022 52867.36 -1400.00
  10. 2 28.10.2022 847.00
  11. 3 28.10.2022 -900.15
  12. 4 28.10.2022 -45.00
  13. 5 28.10.2022 52744.21 -25.00
  14. 6 31.10.2022 -57.30
  15. 7 31.10.2022 52646.91 -40.00
  16. 8 01.11.2022 -12528.55
  17. 9 01.11.2022 34898.36 -5220.00
  18. 10 03.11.2022 34871.46 -26.90
  19. 11 04.11.2022 51026.46 16155.00
  20. 12 07.11.2022 -275.00
  21. 13 07.11.2022 -105.00
  22. 14 07.11.2022 50612.36 -34.10
  23. 15 08.11.2022 61468.52 10856.16
  24. 16 09.11.2022 7663.95
  25. 17 09.11.2022 69563.27 430.80

答案1

得分: 2

  1. df['Balance'] = (pd.to_numeric(df['Balance'])
  2. .fillna(df['Amount'].shift(-1).cumsum().add(df.iloc[0]['Balance']).shift(1)))
英文:

You can cumsum on Amount column to get difference to the first value of Balance then fillna value in Balance column

  1. df['Balance'] = (pd.to_numeric(df['Balance'])
  2. .fillna(df['Amount'].shift(-1).cumsum().add(df.iloc[0]['Balance']).shift(1)))
  1. print(df)
  2. Date Balance Amount
  3. 0 26.10.2022 54267.36 -390.00
  4. 1 27.10.2022 52867.36 -1400.00
  5. 2 28.10.2022 53714.36 847.00
  6. 3 28.10.2022 52814.21 -900.15
  7. 4 28.10.2022 52769.21 -45.00
  8. 5 28.10.2022 52744.21 -25.00
  9. 6 31.10.2022 52686.91 -57.30
  10. 7 31.10.2022 52646.91 -40.00
  11. 8 01.11.2022 40118.36 -12528.55
  12. 9 01.11.2022 34898.36 -5220.00
  13. 10 03.11.2022 34871.46 -26.90
  14. 11 04.11.2022 51026.46 16155.00
  15. 12 07.11.2022 50751.46 -275.00
  16. 13 07.11.2022 50646.46 -105.00
  17. 14 07.11.2022 50612.36 -34.10
  18. 15 08.11.2022 61468.52 10856.16
  19. 16 09.11.2022 69132.47 7663.95
  20. 17 09.11.2022 69563.27 430.80

答案2

得分: 0

这是代码的翻译结果:

  1. import pandas as pd
  2. from itertools import accumulate
  3. l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
  4. l2 = [54267.36, 52867.36, '', '', '', 52744.21, '', 52646.91, '', 34898.36, 34871.46, 51026.46, '', '', 50612.36, 61468.52, '', 69563.27]
  5. l3 = [-390, -1400, 847, -900.15, -45, -25, -57.3, -40, -12528.55, -5220, -26.9, 16155, -275, -105, -34.1, 10856.16, 7663.95, 430.8]
  6. df = pd.DataFrame(list(zip(l1, l2, l3)), columns=['Date', 'Balance', 'Amount'])
  7. df["Balance"] = df["Balance"].apply(lambda x: None if x == '' else x).astype(float)
  8. df["Balance"] = [df.loc[0, "Balance"]] + list(accumulate(df.loc[2:, "Amount"], initial=df.loc[1, 'Balance']))
  9. print(df)

这是输出结果:

  1. Date Balance Amount
  2. 0 26.10.2022 54267.36 -390.00
  3. 1 27.10.2022 52867.36 -1400.00
  4. 2 28.10.2022 53714.36 847.00
  5. 3 28.10.2022 52814.21 -900.15
  6. 4 28.10.2022 52769.21 -45.00
  7. 5 28.10.2022 52744.21 -25.00
  8. 6 31.10.2022 52686.91 -57.30
  9. 7 31.10.2022 52646.91 -40.00
  10. 8 01.11.2022 40118.36 -12528.55
  11. 9 01.11.2022 34898.36 -5220.00
  12. 10 03.11.2022 34871.46 -26.90
  13. 11 04.11.2022 51026.46 16155.00
  14. 12 07.11.2022 50751.46 -275.00
  15. 13 07.11.2022 50646.46 -105.00
  16. 14 07.11.2022 50612.36 -34.10
  17. 15 08.11.2022 61468.52 10856.16
  18. 16 09.11.2022 69132.47 7663.95
  19. 17 09.11.2022 69563.27 430.80
英文:

I think you should go with the pandas solution @Ynjxsjmh posted above, but I went for the stdlib's itertools.

  1. import pandas as pd
  2. from itertools import accumulate
  3. l1 = ['26.10.2022', '27.10.2022', '28.10.2022', '28.10.2022', '28.10.2022','28.10.2022', '31.10.2022', '31.10.2022', '01.11.2022', '01.11.2022', '03.11.2022', '04.11.2022', '07.11.2022', '07.11.2022', '07.11.2022', '08.11.2022', '09.11.2022', '09.11.2022']
  4. l2 = [54267.36,52867.36, '','' , '',52744.21,'' ,52646.91,'',34898.36,34871.46,51026.46,'','',50612.36,61468.52,'',69563.27]
  5. l3 = [-390,-1400,847,-900.15,-45,-25,-57.3,-40,-12528.55,-5220,-26.9,16155,-275,-105,-34.1,10856.16,7663.95,430.8]
  6. df = pd.DataFrame(list(zip(l1,l2,l3)), columns = ['Date','Balance','Amount'])
  7. df["Balance"] = df["Balance"].apply(lambda x: None if x == '' else x).astype(float)
  8. df["Balance"] = [df.loc[0, "Balance"]] + list(accumulate(df.loc[2:, "Amount"], initial=df.loc[1, 'Balance']))
  9. print(df)

This gives:

  1. Date Balance Amount
  2. 0 26.10.2022 54267.36 -390.00
  3. 1 27.10.2022 52867.36 -1400.00
  4. 2 28.10.2022 53714.36 847.00
  5. 3 28.10.2022 52814.21 -900.15
  6. 4 28.10.2022 52769.21 -45.00
  7. 5 28.10.2022 52744.21 -25.00
  8. 6 31.10.2022 52686.91 -57.30
  9. 7 31.10.2022 52646.91 -40.00
  10. 8 01.11.2022 40118.36 -12528.55
  11. 9 01.11.2022 34898.36 -5220.00
  12. 10 03.11.2022 34871.46 -26.90
  13. 11 04.11.2022 51026.46 16155.00
  14. 12 07.11.2022 50751.46 -275.00
  15. 13 07.11.2022 50646.46 -105.00
  16. 14 07.11.2022 50612.36 -34.10
  17. 15 08.11.2022 61468.52 10856.16
  18. 16 09.11.2022 69132.47 7663.95
  19. 17 09.11.2022 69563.27 430.80

huangapple
  • 本文由 发表于 2023年2月7日 00:38:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364126.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定