英文:
Python read values from multiple rows based on conditions
问题
# Check the ID='16', if match read 'Date/Time', 'C_1' and 'C_10' column values.
df_16 = dfa[dfa['ID'] == 16][['Date/Time', 'ID', 'C_1', 'C_10']]
# Find the first previous(from ID='16') row location where ID='136' and read column values from C_2 to C_8.
prev_136 = dfa[dfa['ID'] == 136][['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']].iloc[:1]
# Find the first previous OR first next (from ID='16') row location where ID='24' and read the 'C_9' column value.
prev_24 = dfa[dfa['ID'] == 24]['C_9'].iloc[:1]
next_24 = dfa[dfa['ID'] == 24]['C_9'].iloc[1:]
# Combine the results
result = df_16.join(prev_136).join(pd.concat([prev_24, next_24]).dropna().iloc[:1])
# Print the result
result
英文:
I have one csv file(2 different samples)
from io import StringIO
import pandas as pd
dfa = pd.read_csv(StringIO("""
Date/Time ID C_1 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10
"18/06/2023 3:51:53" 136 101 2028 61 4 3 18 0 0 2 2
"18/06/2023 3:51:54" 24 101 2029 65 0 0 0 1 1 NA 1
"18/06/2023 3:51:55" 16 101 2022 89 0 0 0 0 0 NA 7
"18/06/2023 3:51:56" 136 101 2024 90 0 0 0 0 0 NA 3
"18/06/2023 3:51:57" 24 101 2026 87 0 1 8 0 0 9 2
"18/06/2023 3:51:58" 136 101 2023 33 0 1 87 0 0 2 2"""), sep="\s+")
dfb = pd.read_csv(StringIO("""
Date/Time ID C_1 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10
"18/06/2023 3:51:53" 24 101 2029 65 0 0 0 1 1 NA 1
"18/06/2023 3:51:54" 136 101 2028 61 4 3 18 0 0 2 2
"18/06/2023 3:51:55" 16 101 2022 89 0 0 0 0 0 NA 7
"18/06/2023 3:51:56" 136 101 2024 90 0 0 0 0 0 NA 3
"18/06/2023 3:51:57" 136 101 2023 33 0 1 87 0 0 2 2
"18/06/2023 3:51:58" 24 101 2026 87 0 1 8 0 0 9 2"""), sep="\s+")
dfc = pd.read_csv(StringIO("""
Date/Time ID C_1 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10
"18/06/2023 3:51:53" 136 101 2028 61 4 3 18 0 0 2 2
"18/06/2023 3:51:54" 24 101 2029 65 0 0 0 1 1 NA 1
"18/06/2023 3:51:54" 136 102 2045 61 2 3 4 5 6 7 8
"18/06/2023 3:51:55" 16 101 2022 89 0 0 0 0 0 NA 7
"18/06/2023 3:51:56" 136 101 2024 90 0 0 0 0 0 NA 3
"18/06/2023 3:51:56" 16 102 2022 89 0 0 0 0 0 NA 11
"18/06/2023 3:51:56" 136 101 2024 90 0 0 0 0 0 NA 3
"18/06/2023 3:51:57" 24 101 2026 87 0 1 8 0 0 9 2
"18/06/2023 3:51:58" 24 102 2045 44 43 42 41 40 39 38 37
"18/06/2023 3:51:59" 136 101 2023 33 0 1 87 0 0 2 2"""), sep="\s+")
I am trying to read the row values(few columns) using below conditions-
- Check the ID='16', if match read 'Date/Time', 'C_1' and 'C_10' column values.
- Find the first previous(from ID='16') row location where ID='136' and read column values from C_2 to C_8.
- Find the first previous OR first next (from ID='16') row location where ID='24' and read the 'C_9' column value(whichever is not empty out of 2 rows OR leave empty if both are empty).
Output -
Date/Time ID C_1 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10
18/06/2023 3:51:55 16 101 2028 61 4 3 18 0 0 9 7
答案1
得分: 3
如果我理解正确的话,请切片三种可能性,使用 mask
和 ffill
/bfill
,然后使用 groupby.first
聚合成一行:
m = df['ID'].eq('16')
m1 = df['ID'].isin(['16', '136'])
m2 = df['ID'].isin(['16', '24'])
cols = ['Date/Time', 'ID', 'C_1', 'C_10']
pd.concat([df.loc[m1].mask(m).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
df.loc[m2].mask(m).ffill().loc[m, cols+['C_9']],
df.loc[m2].mask(m).bfill().loc[m, cols+['C_9']]]).groupby(level=0).first()
注意:concat
中切片的顺序决定了优先级,将保留第一个非 NaN 值。此外,如果在 C_1/C_2 中有多个可能的值,你应该将 ffill
/bfill
替换为 .groupby(['ID', 'C_1', 'C_10']).ffill()
/bfill()
。
输出:
Date/Time ID C_1 C_10 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
2 18/06/2023 3:51:53 136 101.0 2.0 2028.0 61.0 4.0 3.0 18.0 0.0 0.0 9.0
根据 C_1 分组的解决方案
m = df['ID'].eq('16')
m1 = df['ID'].isin(['16', '136'])
m2 = df['ID'].isin(['16', '24'])
cols = ['Date/Time', 'ID', 'C_1', 'C_10']
tmp = df.mask(m).fillna({'C_1': df['C_1']})
pd.concat([tmp.loc[m1].groupby(df['C_1']).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
tmp.loc(m2).groupby(df['C_1']).ffill().loc[m, cols+['C_9']],
tmp.loc(m2).groupby(df['C_1']).bfill().loc[m, cols+['C_9']]]).groupby(level=0).first()
输出(使用 dfc
作为输入):
Date/Time ID C_1 C_10 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
3 18/06/2023 3:51:53 136 101.0 2.0 2028.0 61.0 4.0 3.0 18.0 0.0 0.0 9.0
5 18/06/2023 3:51:54 136 102.0 8.0 2045.0 61.0 2.0 3.0 4.0 5.0 6.0 38.0
英文:
If I understood correctly, slice the three possibilities, use mask
and ffill
/bfill
, then aggregate into a single row with groupby.first
:
m = df['ID'].eq('16')
m1 = df['ID'].isin(['16', '136'])
m2 = df['ID'].isin(['16', '24'])
cols = ['Date/Time', 'ID', 'C_1', 'C_10']
pd.concat([df.loc[m1].mask(m).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
df.loc[m2].mask(m).ffill().loc[m, cols+['C_9']],
df.loc[m2].mask(m).bfill().loc[m, cols+['C_9']],
]).groupby(level=0).first()
NB. precedence is given by the order of the slices in concat
, the first non-NaN value will be kept. Also, if you have multiple possible values in C_1/C_2, you should replace ffill
/bfill
by .groupby(['ID', 'C_1', 'C_10']).ffill()
/bfill()
.
Output:
Date/Time ID C_1 C_10 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
2 18/06/2023 3:51:53 136 101.0 2.0 2028.0 61.0 4.0 3.0 18.0 0.0 0.0 9.0
solution per C_1 group
m = df['ID'].eq('16')
m1 = df['ID'].isin(['16', '136'])
m2 = df['ID'].isin(['16', '24'])
cols = ['Date/Time', 'ID', 'C_1', 'C_10']
tmp = df.mask(m).fillna({'C_1': df['C_1']})
pd.concat([tmp.loc[m1].groupby(df['C_1']).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
tmp.loc[m2].groupby(df['C_1']).ffill().loc[m, cols+['C_9']],
tmp.loc[m2].groupby(df['C_1']).bfill().loc[m, cols+['C_9']],
]).groupby(level=0).first()
Output (using dfc
as input):
Date/Time ID C_1 C_10 C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9
3 18/06/2023 3:51:53 136 101.0 2.0 2028.0 61.0 4.0 3.0 18.0 0.0 0.0 9.0
5 18/06/2023 3:51:54 136 102.0 8.0 2045.0 61.0 2.0 3.0 4.0 5.0 6.0 38.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论