2023年7月17日 17:21:43go评论94阅读模式

英文:

Python read values from multiple rows based on conditions

问题

# Check the ID='16', if match read 'Date/Time', 'C_1' and 'C_10' column values.
df_16 = dfa[dfa['ID'] == 16][['Date/Time', 'ID', 'C_1', 'C_10']]
# Find the first previous(from ID='16') row location where ID='136' and read column values from C_2 to C_8.
prev_136 = dfa[dfa['ID'] == 136][['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']].iloc[:1]
# Find the first previous OR first next (from ID='16') row location where ID='24' and read the 'C_9' column value.
prev_24 = dfa[dfa['ID'] == 24]['C_9'].iloc[:1]
next_24 = dfa[dfa['ID'] == 24]['C_9'].iloc[1:]
# Combine the results
result = df_16.join(prev_136).join(pd.concat([prev_24, next_24]).dropna().iloc[:1])
# Print the result
result

英文:

I have one csv file(2 different samples)

from io import StringIO
import pandas as pd
dfa = pd.read_csv(StringIO(&quot;&quot;&quot;
      Date/Time     ID  C_1 C_2  C_3   C_4 C_5 C_6  C_7  C_8 C_9 C_10
&quot;18/06/2023 3:51:53&quot;  136 101 2028  61    4   3   18   0    0   2    2
&quot;18/06/2023 3:51:54&quot;  24  101 2029  65    0   0   0    1    1   NA   1
&quot;18/06/2023 3:51:55&quot;  16  101 2022  89    0   0   0    0    0   NA   7
&quot;18/06/2023 3:51:56&quot;  136 101 2024  90    0   0   0    0    0   NA   3
&quot;18/06/2023 3:51:57&quot;  24  101 2026  87    0   1   8    0    0   9    2
&quot;18/06/2023 3:51:58&quot;  136 101 2023  33    0   1   87   0    0   2    2&quot;&quot;&quot;), sep=&quot;\s+&quot;)
dfb = pd.read_csv(StringIO(&quot;&quot;&quot;
      Date/Time     ID  C_1  C_2  C_3  C_4 C_5 C_6  C_7  C_8 C_9 C_10
&quot;18/06/2023 3:51:53&quot;  24  101 2029  65    0   0   0    1    1   NA   1
&quot;18/06/2023 3:51:54&quot;  136 101 2028  61    4   3   18   0    0   2    2
&quot;18/06/2023 3:51:55&quot;  16  101 2022  89    0   0   0    0    0   NA   7
&quot;18/06/2023 3:51:56&quot;  136 101 2024  90    0   0   0    0    0   NA   3
&quot;18/06/2023 3:51:57&quot;  136 101 2023  33    0   1   87   0    0   2    2
&quot;18/06/2023 3:51:58&quot;  24  101 2026  87    0   1   8    0    0   9    2&quot;&quot;&quot;), sep=&quot;\s+&quot;)
dfc = pd.read_csv(StringIO(&quot;&quot;&quot;
      Date/Time     ID  C_1 C_2  C_3   C_4 C_5 C_6  C_7  C_8 C_9 C_10
&quot;18/06/2023 3:51:53&quot;  136 101 2028  61    4   3   18   0    0   2    2
&quot;18/06/2023 3:51:54&quot;  24  101 2029  65    0   0   0    1    1   NA   1
&quot;18/06/2023 3:51:54&quot;  136 102 2045  61    2   3   4    5    6   7    8
&quot;18/06/2023 3:51:55&quot;  16  101 2022  89    0   0   0    0    0   NA   7
&quot;18/06/2023 3:51:56&quot;  136 101 2024  90    0   0   0    0    0   NA   3
&quot;18/06/2023 3:51:56&quot;  16  102 2022  89    0   0   0    0    0   NA   11
&quot;18/06/2023 3:51:56&quot;  136 101 2024  90    0   0   0    0    0   NA   3
&quot;18/06/2023 3:51:57&quot;  24  101 2026  87    0   1   8    0    0   9    2
&quot;18/06/2023 3:51:58&quot;  24  102 2045  44    43  42  41   40   39  38   37
&quot;18/06/2023 3:51:59&quot;  136 101 2023  33    0   1   87   0    0   2    2&quot;&quot;&quot;), sep=&quot;\s+&quot;)

I am trying to read the row values(few columns) using below conditions-

Check the ID='16', if match read 'Date/Time', 'C_1' and 'C_10' column values.
Find the first previous(from ID='16') row location where ID='136' and read column values from C_2 to C_8.
Find the first previous OR first next (from ID='16') row location where ID='24' and read the 'C_9' column value(whichever is not empty out of 2 rows OR leave empty if both are empty).

Output -

      Date/Time		ID  C_1  C_2   C_3  C_4 C_5 C_6  C_7  C_8 C_9 C_10
18/06/2023  3:51:55	16	101	 2028   61	 4	 3	 18	  0	   0   9   7

答案1

得分: 3

如果我理解正确的话，请切片三种可能性，使用 mask 和 ffill/bfill，然后使用 groupby.first 聚合成一行：

m = df['ID'].eq('16')
m1 = df['ID'].isin(['16', '136'])
m2 = df['ID'].isin(['16', '24'])
cols = ['Date/Time', 'ID', 'C_1', 'C_10']
pd.concat([df.loc[m1].mask(m).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
           df.loc[m2].mask(m).ffill().loc[m, cols+['C_9']],
           df.loc[m2].mask(m).bfill().loc[m, cols+['C_9']]]).groupby(level=0).first()

注意：concat 中切片的顺序决定了优先级，将保留第一个非 NaN 值。此外，如果在 C_1/C_2 中有多个可能的值，你应该将 ffill/bfill 替换为 .groupby(['ID', 'C_1', 'C_10']).ffill()/bfill()。

输出：

            Date/Time   ID    C_1  C_10     C_2   C_3  C_4  C_5   C_6  C_7  C_8  C_9
2  18/06/2023 3:51:53  136  101.0   2.0  2028.0  61.0  4.0  3.0  18.0  0.0  0.0  9.0

根据 C_1 分组的解决方案

m = df['ID'].eq('16')
m1 = df['ID'].isin(['16', '136'])
m2 = df['ID'].isin(['16', '24'])
cols = ['Date/Time', 'ID', 'C_1', 'C_10']
tmp = df.mask(m).fillna({'C_1': df['C_1']})
pd.concat([tmp.loc[m1].groupby(df['C_1']).ffill().loc[m, cols+['C_2', 'C_3', 'C_4', 'C_5', 'C_6', 'C_7', 'C_8']],
           tmp.loc(m2).groupby(df['C_1']).ffill().loc[m, cols+['C_9']],
           tmp.loc(m2).groupby(df['C_1']).bfill().loc[m, cols+['C_9']]]).groupby(level=0).first()

输出（使用 dfc 作为输入）：

            Date/Time   ID    C_1  C_10     C_2   C_3  C_4  C_5   C_6  C_7  C_8   C_9
3  18/06/2023 3:51:53  136  101.0   2.0  2028.0  61.0  4.0  3.0  18.0  0.0  0.0   9.0
5  18/06/2023 3:51:54  136  102.0   8.0  2045.0  61.0  2.0  3.0   4.0  5.0  6.0  38.0

英文:

If I understood correctly, slice the three possibilities, use mask and ffill/bfill, then aggregate into a single row with groupby.first:

m = df[&#39;ID&#39;].eq(&#39;16&#39;)
m1 = df[&#39;ID&#39;].isin([&#39;16&#39;, &#39;136&#39;])
m2 = df[&#39;ID&#39;].isin([&#39;16&#39;, &#39;24&#39;])
cols = [&#39;Date/Time&#39;, &#39;ID&#39;, &#39;C_1&#39;, &#39;C_10&#39;]
pd.concat([df.loc[m1].mask(m).ffill().loc[m, cols+[&#39;C_2&#39;, &#39;C_3&#39;, &#39;C_4&#39;, &#39;C_5&#39;, &#39;C_6&#39;, &#39;C_7&#39;, &#39;C_8&#39;]],
           df.loc[m2].mask(m).ffill().loc[m, cols+[&#39;C_9&#39;]],
           df.loc[m2].mask(m).bfill().loc[m, cols+[&#39;C_9&#39;]],
          ]).groupby(level=0).first()

NB. precedence is given by the order of the slices in concat, the first non-NaN value will be kept. Also, if you have multiple possible values in C_1/C_2, you should replace ffill/bfill by .groupby(['ID', 'C_1', 'C_10']).ffill()/bfill().

Output:

            Date/Time   ID    C_1  C_10     C_2   C_3  C_4  C_5   C_6  C_7  C_8  C_9
2  18/06/2023 3:51:53  136  101.0   2.0  2028.0  61.0  4.0  3.0  18.0  0.0  0.0  9.0

solution per C_1 group

m = df[&#39;ID&#39;].eq(&#39;16&#39;)
m1 = df[&#39;ID&#39;].isin([&#39;16&#39;, &#39;136&#39;])
m2 = df[&#39;ID&#39;].isin([&#39;16&#39;, &#39;24&#39;])
cols = [&#39;Date/Time&#39;, &#39;ID&#39;, &#39;C_1&#39;, &#39;C_10&#39;]
tmp = df.mask(m).fillna({&#39;C_1&#39;: df[&#39;C_1&#39;]})
pd.concat([tmp.loc[m1].groupby(df[&#39;C_1&#39;]).ffill().loc[m, cols+[&#39;C_2&#39;, &#39;C_3&#39;, &#39;C_4&#39;, &#39;C_5&#39;, &#39;C_6&#39;, &#39;C_7&#39;, &#39;C_8&#39;]],
           tmp.loc[m2].groupby(df[&#39;C_1&#39;]).ffill().loc[m, cols+[&#39;C_9&#39;]],
           tmp.loc[m2].groupby(df[&#39;C_1&#39;]).bfill().loc[m, cols+[&#39;C_9&#39;]],
          ]).groupby(level=0).first()

Output (using dfc as input):

            Date/Time   ID    C_1  C_10     C_2   C_3  C_4  C_5   C_6  C_7  C_8   C_9
3  18/06/2023 3:51:53  136  101.0   2.0  2028.0  61.0  4.0  3.0  18.0  0.0  0.0   9.0
5  18/06/2023 3:51:54  136  102.0   8.0  2045.0  61.0  2.0  3.0   4.0  5.0  6.0  38.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python 根据条件读取多行数值。

问题

答案1

根据 C_1 分组的解决方案

solution per C_1 group

出现了未定义的字母？

想要检测日期列并将它们转换成datetime64数据类型格式。

Trying to sum a column, but only if the value corresponds to a value in another column (excel doc imported to Jupyter)

如何将一个具有两个参数的 def 函数应用于 Pandas 列表格中。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。