2023年6月11日 19:58:30go评论153阅读模式

英文:

Check N number of columns where same condition met between consecutive dates and return number and ID of columns for each group

问题

以下是翻译好的部分：

对于这个数据框，我想知道每台**机器**在最新日期上创建了多少**传感器**的**高低**情况。

**MAC1**的**SENS_1**值今天（2023-06-09）大于昨天（2023-06-08）的值。同样，对于**SENS_2**、**SENS_3**和**SENS_N**也是如此。

因此，我需要返回一个包含满足以下条件的机器名称和传感器数量的数据框(df)列表：

    | 机器 | 传感器 | 传感器数量 |
    |:---- |:------:| -----:|
    | MAC1  | SENS_1,SENS_2,SENS_3,SENS_N    | 4 |
    | MAC2  | SENS_1,SENS_3,SENS_4    | 3 |
    | MAC3  | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N    | 5 |
    | MAC4 | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N    | 5 |
    | MAC5 | SENS_1,SENS_2,SENS_3,SENS_N    | 4 |
    | MACN | SENS_1,SENS_N    | 2 |

我可以使用以下代码来逐个识别具有高低值的传感器：

    high_lows = df.loc[(df[&#39;MAC1&#39;] == df[&#39;MAC1&#39;].shift(-1)) &amp; (df[&#39;SENS_5&#39;] &lt; df[&#39;SENS_5&#39;].shift(-1)) ,&#39;MAC1&#39;]

但是有近**500台机器和150个传感器**。

因此，我认为需要迭代列，选择MAC列后的传感器列：

    df.iloc[:,2:]

并检查条件。

*但是此函数将每2000毫秒（2秒）运行一次，因此必须以最快的方式完成。*

在迭代N列以及提取匹配条件的组的情况下，最有效的方法是什么？

*带有NaN的示例数据：*

               日期    MAC     SENS_1   SENS_2     SENS_3      SENS_4     SENS_5
    82   2023-06-08  MAC41  79.608346      NaN  66.181849  149.590573  12.135441
    83   2023-06-09  MAC41  80.655564      NaN  67.636943  109.034809  12.278676

请注意，我已经删除了代码部分，只返回了翻译的内容。

英文:

          Date   MAC     SENS_1   SENS_2     SENS_3      SENS_4     SENS_N
0   2023-06-08  MAC1  79.608346  64.2030  66.181849  149.590573  12.135441
1   2023-06-09  MAC1  80.655564  64.6430  67.636943  109.034809  12.278676
2   2023-06-08  MAC2   5.323341   4.7814  57.231430   51.422297   0.494853
3   2023-06-09  MAC2   5.375561   4.7788  63.701169   88.638871   0.471265
4   2023-06-08  MAC3   3.221366   3.0242  61.071023  113.812544   0.251471
5   2023-06-09  MAC3   3.277577   3.0300  67.280632  133.127433   0.274765
6   2023-06-08  MAC4   3.029806   2.6720  63.310941   72.074345   0.393294
7   2023-06-09  MAC4   3.129871   2.6850  72.727142  100.133454   0.408941
8   2023-06-08  MAC5  18.987204  18.1676  57.660920   98.760819   1.219735
9   2023-06-09  MAC5  19.158136  18.1766  61.598102   93.546477   1.270471
10  2023-06-08  MACN  38.190817  35.2416  68.589171  160.597307   2.531353
11  2023-06-09  MACN  38.393878  34.3196  67.710298  139.854140   2.675000

For this dataframe I would like to know how many SENSORS for each MACHINE creating a HIGH-LOW situation for the latest date.

SENS_1 value of MAC1 for today (2023-06-09) is greater than yesterday's (2023-06-08) value. Also it's the same for SENS_2, SENS_3 and SENS_N

So what I need to return is a list(df) of MACHINEs with the name and numbers of SENSORS that meet the condition:

| MACHINE | SENSORS | N. of SENSORS |
|:---- |:------:| -----:|
| MAC1  | SENS_1,SENS_2,SENS_3,SENS_N    | 4 |
| MAC2  | SENS_1,SENS_3,SENS_4    | 3 |
| MAC3  | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N    | 5 |
| MAC4 | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N    | 5 |
| MAC5 | SENS_1,SENS_2,SENS_3,SENS_N    | 4 |
| MACN | SENS_1,SENS_N    | 2 |

I can use:

high_lows = df.loc[(df[&#39;MAC1&#39;] == df[&#39;MAC1&#39;].shift(-1)) &amp; (df[&#39;SENS_5&#39;] &lt; df[&#39;SENS_5&#39;].shift(-1)) ,&#39;MAC1&#39;]

to identify SENSORS with high-lows one by one.
However there are almost 500 machines and 150 sensors.

So I think I need to iterate over columns by selecting SENSORS after MAC column:

df.iloc[:,2:]

and check for the condition.

However this function will run every 2000ms(2secs) so it has to be done with the fastest way possible.

What is the most efficient way of iterating over N columns for N groups and extract groups that match the condition?

Example data with NaNs:

           Date    MAC     SENS_1   SENS_2     SENS_3      SENS_4     SENS_5
82   2023-06-08  MAC41  79.608346      NaN  66.181849  149.590573  12.135441
83   2023-06-09  MAC41  80.655564      NaN  67.636943  109.034809  12.278676

答案1

得分: 2

你可以使用以下代码：

def count(df):
    m = df.iloc[:, 2:].diff().fillna(True).gt(0).all(axis=0)
    return pd.Series({'Sensors': ', '.join(m.index[m]), 'Count': m.sum()})

out = df.groupby('MAC', as_index=False).apply(count)

输出：

>>> out
    MAC                                 Sensors  Count
0  MAC1          SENS_1, SENS_2, SENS_3, SENS_N      4
1  MAC2                  SENS_1, SENS_3, SENS_4      3
2  MAC3  SENS_1, SENS_2, SENS_3, SENS_4, SENS_N      5
3  MAC4  SENS_1, SENS_2, SENS_3, SENS_4, SENS_N      5
4  MAC5          SENS_1, SENS_2, SENS_3, SENS_N      4
5  MACN                          SENS_1, SENS_N      2

或者使用另一种方法（假设每台机器只有今天和昨天的数据）：

def count(df):
    m = df.iloc[1, 2:] - df.iloc[0, 2:] > 0
    return pd.Series({'Sensors': ', '.join(m.index[m]), 'Count': m.sum()})

out = df.groupby('MAC', as_index=False).apply(count)

请注意，这些代码是用于处理数据的示例代码，具体的数据处理流程需要根据你的数据和需求进行调整。

英文:

You can use:

def count(df):
    m = df.iloc[:, 2:].diff().fillna(True).gt(0).all(axis=0)
    return pd.Series({&#39;Sensors&#39;: &#39;, &#39;.join(m.index[m]), &#39;Count&#39;: m.sum()})

out = df.groupby(&#39;MAC&#39;, as_index=False).apply(count)

Output:

&gt;&gt;&gt; out
    MAC                                 Sensors  Count
0  MAC1          SENS_1, SENS_2, SENS_3, SENS_N      4
1  MAC2                  SENS_1, SENS_3, SENS_4      3
2  MAC3  SENS_1, SENS_2, SENS_3, SENS_4, SENS_N      5
3  MAC4  SENS_1, SENS_2, SENS_3, SENS_4, SENS_N      5
4  MAC5          SENS_1, SENS_2, SENS_3, SENS_N      4
5  MACN                          SENS_1, SENS_N      2

Alternative (as you have only today and yesterday rows per machine):

def count(df):
    m = df.iloc[1, 2:] - df.iloc[0, 2:] &gt; 0
    return pd.Series({&#39;Sensors&#39;: &#39;, &#39;.join(m.index[m]), &#39;Count&#39;: m.sum()})

out = df.groupby(&#39;MAC&#39;, as_index=False).apply(count)

答案2

得分: 1

columns = df.columns[2:]

diffs = (
   df.groupby('MAC', sort=False)[columns]
     .diff().dropna().gt(0)     
     .where(lambda df: ~df, columns.to_series(), axis=1) # 用列名替换`True`
     .replace(False, float('nan'))
)   

count = diffs.count(axis=1)
str_concat = diffs.agg(lambda cols: ','.join(cols.dropna()), axis=1)

out = df.loc[count.index, ['MAC']]

out['SENSORS'] = str_concat
out['N. OF SENSORS'] = count

英文:

columns = df.columns[2:]

diffs = (
   df.groupby(&#39;MAC&#39;, sort=False)[columns]
     .diff().dropna().gt(0)     
     .where(lambda df: ~df, columns.to_series(), axis=1) # Replace `True` with column name
     .replace(False, float(&#39;nan&#39;))
)   

count = diffs.count(axis=1)
str_concat = diffs.agg(lambda cols: &#39;,&#39;.join(cols.dropna()), axis=1)

out = df.loc[count.index, [&#39;MAC&#39;]]

out[&#39;SENSORS&#39;] = str_concat
out[&#39;N. OF SENSORS&#39;] = count

     MAC                             SENSORS  N. OF SENSORS
1   MAC1         SENS_1,SENS_2,SENS_3,SENS_N              4
3   MAC2                SENS_1,SENS_3,SENS_4              3
5   MAC3  SENS_1,SENS_2,SENS_3,SENS_4,SENS_N              5
7   MAC4  SENS_1,SENS_2,SENS_3,SENS_4,SENS_N              5
9   MAC5         SENS_1,SENS_2,SENS_3,SENS_N              4
11  MACN                       SENS_1,SENS_N              2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

检查连续日期之间满足相同条件的 N 个列，并返回每个组的列数和ID。

问题

答案1

答案2

获取在另一列中不存在的列数值 pandas

Compute outliers 2 standard dev away for each pandas DataFrame column and replace with NaN

合并同一数据框中的两列，消除 “0” 值。

从Pandas中的居中滚动平均中排除中心值。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论