Python 根据条件读取多行数值。

huangapple go评论94阅读模式
英文:

Python read values from multiple rows based on conditions

问题

  1. # Check the ID='16', if match read 'Date/Time', 'C_1' and 'C_10' column values.
  2. df_16 = dfa[dfa['ID'] == 16][['Date/Time', 'ID', 'C_1', 'C_10']]
  3. # Find the first previous(from ID='16') row location where ID='136' and read column values from C_2 to C_8.
  4. prev_136 = dfa[dfa['ID'] == 136][['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']].iloc[:1]
  5. # Find the first previous OR first next (from ID='16') row location where ID='24' and read the 'C_9' column value.
  6. prev_24 = dfa[dfa['ID'] == 24]['C_9'].iloc[:1]
  7. next_24 = dfa[dfa['ID'] == 24]['C_9'].iloc[1:]
  8. # Combine the results
  9. result = df_16.join(prev_136).join(pd.concat([prev_24, next_24]).dropna().iloc[:1])
  10. # Print the result
  11. result
英文:

I have one csv file(2 different samples)

  1. from io import StringIO
  2. import pandas as pd
  3. dfa = pd.read_csv(StringIO("""
  4. Date/Time ID C_1 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10
  5. "18/06/2023 3:51:53" 136 101 2028 61 4 3 18 0 0 2 2
  6. "18/06/2023 3:51:54" 24 101 2029 65 0 0 0 1 1 NA 1
  7. "18/06/2023 3:51:55" 16 101 2022 89 0 0 0 0 0 NA 7
  8. "18/06/2023 3:51:56" 136 101 2024 90 0 0 0 0 0 NA 3
  9. "18/06/2023 3:51:57" 24 101 2026 87 0 1 8 0 0 9 2
  10. "18/06/2023 3:51:58" 136 101 2023 33 0 1 87 0 0 2 2"""), sep="\s+")
  11. dfb = pd.read_csv(StringIO("""
  12. Date/Time ID C_1 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10
  13. "18/06/2023 3:51:53" 24 101 2029 65 0 0 0 1 1 NA 1
  14. "18/06/2023 3:51:54" 136 101 2028 61 4 3 18 0 0 2 2
  15. "18/06/2023 3:51:55" 16 101 2022 89 0 0 0 0 0 NA 7
  16. "18/06/2023 3:51:56" 136 101 2024 90 0 0 0 0 0 NA 3
  17. "18/06/2023 3:51:57" 136 101 2023 33 0 1 87 0 0 2 2
  18. "18/06/2023 3:51:58" 24 101 2026 87 0 1 8 0 0 9 2"""), sep="\s+")
  19. dfc = pd.read_csv(StringIO("""
  20. Date/Time ID C_1 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10
  21. "18/06/2023 3:51:53" 136 101 2028 61 4 3 18 0 0 2 2
  22. "18/06/2023 3:51:54" 24 101 2029 65 0 0 0 1 1 NA 1
  23. "18/06/2023 3:51:54" 136 102 2045 61 2 3 4 5 6 7 8
  24. "18/06/2023 3:51:55" 16 101 2022 89 0 0 0 0 0 NA 7
  25. "18/06/2023 3:51:56" 136 101 2024 90 0 0 0 0 0 NA 3
  26. "18/06/2023 3:51:56" 16 102 2022 89 0 0 0 0 0 NA 11
  27. "18/06/2023 3:51:56" 136 101 2024 90 0 0 0 0 0 NA 3
  28. "18/06/2023 3:51:57" 24 101 2026 87 0 1 8 0 0 9 2
  29. "18/06/2023 3:51:58" 24 102 2045 44 43 42 41 40 39 38 37
  30. "18/06/2023 3:51:59" 136 101 2023 33 0 1 87 0 0 2 2"""), sep="\s+")

I am trying to read the row values(few columns) using below conditions-

  • Check the ID='16', if match read 'Date/Time', 'C_1' and 'C_10' column values.
  • Find the first previous(from ID='16') row location where ID='136' and read column values from C_2 to C_8.
  • Find the first previous OR first next (from ID='16') row location where ID='24' and read the 'C_9' column value(whichever is not empty out of 2 rows OR leave empty if both are empty).

Output -

  1. Date/Time ID C_1 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10
  2. 18/06/2023 3:51:55 16 101 2028 61 4 3 18 0 0 9 7

答案1

得分: 3

如果我理解正确的话,请切片三种可能性,使用 maskffill/bfill,然后使用 groupby.first 聚合成一行:

  1. m = df['ID'].eq('16')
  2. m1 = df['ID'].isin(['16', '136'])
  3. m2 = df['ID'].isin(['16', '24'])
  4. cols = ['Date/Time', 'ID', 'C_1', 'C_10']
  5. pd.concat([df.loc[m1].mask(m).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
  6. df.loc[m2].mask(m).ffill().loc[m, cols+['C_9']],
  7. df.loc[m2].mask(m).bfill().loc[m, cols+['C_9']]]).groupby(level=0).first()

注意:concat 中切片的顺序决定了优先级,将保留第一个非 NaN 值。此外,如果在 C_1/C_2 中有多个可能的值,你应该将 ffill/bfill 替换为 .groupby(['ID', 'C_1', 'C_10']).ffill()/bfill()

输出:

  1. Date/Time ID C_1 C_10 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
  2. 2 18/06/2023 3:51:53 136 101.0 2.0 2028.0 61.0 4.0 3.0 18.0 0.0 0.0 9.0

根据 C_1 分组的解决方案

  1. m = df['ID'].eq('16')
  2. m1 = df['ID'].isin(['16', '136'])
  3. m2 = df['ID'].isin(['16', '24'])
  4. cols = ['Date/Time', 'ID', 'C_1', 'C_10']
  5. tmp = df.mask(m).fillna({'C_1': df['C_1']})
  6. pd.concat([tmp.loc[m1].groupby(df['C_1']).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
  7. tmp.loc(m2).groupby(df['C_1']).ffill().loc[m, cols+['C_9']],
  8. tmp.loc(m2).groupby(df['C_1']).bfill().loc[m, cols+['C_9']]]).groupby(level=0).first()

输出(使用 dfc 作为输入):

  1. Date/Time ID C_1 C_10 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
  2. 3 18/06/2023 3:51:53 136 101.0 2.0 2028.0 61.0 4.0 3.0 18.0 0.0 0.0 9.0
  3. 5 18/06/2023 3:51:54 136 102.0 8.0 2045.0 61.0 2.0 3.0 4.0 5.0 6.0 38.0
英文:

If I understood correctly, slice the three possibilities, use mask and ffill/bfill, then aggregate into a single row with groupby.first:

  1. m = df['ID'].eq('16')
  2. m1 = df['ID'].isin(['16', '136'])
  3. m2 = df['ID'].isin(['16', '24'])
  4. cols = ['Date/Time', 'ID', 'C_1', 'C_10']
  5. pd.concat([df.loc[m1].mask(m).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
  6. df.loc[m2].mask(m).ffill().loc[m, cols+['C_9']],
  7. df.loc[m2].mask(m).bfill().loc[m, cols+['C_9']],
  8. ]).groupby(level=0).first()

NB. precedence is given by the order of the slices in concat, the first non-NaN value will be kept. Also, if you have multiple possible values in C_1/C_2, you should replace ffill/bfill by .groupby(['ID', 'C_1', 'C_10']).ffill()/bfill().

Output:

  1. Date/Time ID C_1 C_10 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
  2. 2 18/06/2023 3:51:53 136 101.0 2.0 2028.0 61.0 4.0 3.0 18.0 0.0 0.0 9.0

solution per C_1 group

  1. m = df['ID'].eq('16')
  2. m1 = df['ID'].isin(['16', '136'])
  3. m2 = df['ID'].isin(['16', '24'])
  4. cols = ['Date/Time', 'ID', 'C_1', 'C_10']
  5. tmp = df.mask(m).fillna({'C_1': df['C_1']})
  6. pd.concat([tmp.loc[m1].groupby(df['C_1']).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
  7. tmp.loc[m2].groupby(df['C_1']).ffill().loc[m, cols+['C_9']],
  8. tmp.loc[m2].groupby(df['C_1']).bfill().loc[m, cols+['C_9']],
  9. ]).groupby(level=0).first()

Output (using dfc as input):

  1. Date/Time ID C_1 C_10 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
  2. 3 18/06/2023 3:51:53 136 101.0 2.0 2028.0 61.0 4.0 3.0 18.0 0.0 0.0 9.0
  3. 5 18/06/2023 3:51:54 136 102.0 8.0 2045.0 61.0 2.0 3.0 4.0 5.0 6.0 38.0

huangapple
  • 本文由 发表于 2023年7月17日 17:21:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76703053.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定