英文:
Check N number of columns where same condition met between consecutive dates and return number and ID of columns for each group
问题
以下是翻译好的部分:
对于这个数据框,我想知道每台**机器**在最新日期上创建了多少**传感器**的**高低**情况。
**MAC1**的**SENS_1**值今天(2023-06-09)大于昨天(2023-06-08)的值。同样,对于**SENS_2**、**SENS_3**和**SENS_N**也是如此。
因此,我需要返回一个包含满足以下条件的机器名称和传感器数量的数据框(df)列表:
| 机器 | 传感器 | 传感器数量 |
|:---- |:------:| -----:|
| MAC1 | SENS_1,SENS_2,SENS_3,SENS_N | 4 |
| MAC2 | SENS_1,SENS_3,SENS_4 | 3 |
| MAC3 | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N | 5 |
| MAC4 | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N | 5 |
| MAC5 | SENS_1,SENS_2,SENS_3,SENS_N | 4 |
| MACN | SENS_1,SENS_N | 2 |
我可以使用以下代码来逐个识别具有高低值的传感器:
high_lows = df.loc[(df['MAC1'] == df['MAC1'].shift(-1)) & (df['SENS_5'] < df['SENS_5'].shift(-1)) ,'MAC1']
但是有近**500台机器和150个传感器**。
因此,我认为需要迭代列,选择MAC列后的传感器列:
df.iloc[:,2:]
并检查条件。
*但是此函数将每2000毫秒(2秒)运行一次,因此必须以最快的方式完成。*
在迭代N列以及提取匹配条件的组的情况下,最有效的方法是什么?
*带有NaN的示例数据:*
日期 MAC SENS_1 SENS_2 SENS_3 SENS_4 SENS_5
82 2023-06-08 MAC41 79.608346 NaN 66.181849 149.590573 12.135441
83 2023-06-09 MAC41 80.655564 NaN 67.636943 109.034809 12.278676
请注意,我已经删除了代码部分,只返回了翻译的内容。
英文:
Date MAC SENS_1 SENS_2 SENS_3 SENS_4 SENS_N
0 2023-06-08 MAC1 79.608346 64.2030 66.181849 149.590573 12.135441
1 2023-06-09 MAC1 80.655564 64.6430 67.636943 109.034809 12.278676
2 2023-06-08 MAC2 5.323341 4.7814 57.231430 51.422297 0.494853
3 2023-06-09 MAC2 5.375561 4.7788 63.701169 88.638871 0.471265
4 2023-06-08 MAC3 3.221366 3.0242 61.071023 113.812544 0.251471
5 2023-06-09 MAC3 3.277577 3.0300 67.280632 133.127433 0.274765
6 2023-06-08 MAC4 3.029806 2.6720 63.310941 72.074345 0.393294
7 2023-06-09 MAC4 3.129871 2.6850 72.727142 100.133454 0.408941
8 2023-06-08 MAC5 18.987204 18.1676 57.660920 98.760819 1.219735
9 2023-06-09 MAC5 19.158136 18.1766 61.598102 93.546477 1.270471
10 2023-06-08 MACN 38.190817 35.2416 68.589171 160.597307 2.531353
11 2023-06-09 MACN 38.393878 34.3196 67.710298 139.854140 2.675000
For this dataframe I would like to know how many SENSORS for each MACHINE creating a HIGH-LOW situation for the latest date.
SENS_1 value of MAC1 for today (2023-06-09) is greater than yesterday's (2023-06-08) value. Also it's the same for SENS_2, SENS_3 and SENS_N
So what I need to return is a list(df) of MACHINEs with the name and numbers of SENSORS that meet the condition:
| MACHINE | SENSORS | N. of SENSORS |
|:---- |:------:| -----:|
| MAC1 | SENS_1,SENS_2,SENS_3,SENS_N | 4 |
| MAC2 | SENS_1,SENS_3,SENS_4 | 3 |
| MAC3 | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N | 5 |
| MAC4 | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N | 5 |
| MAC5 | SENS_1,SENS_2,SENS_3,SENS_N | 4 |
| MACN | SENS_1,SENS_N | 2 |
I can use:
high_lows = df.loc[(df['MAC1'] == df['MAC1'].shift(-1)) & (df['SENS_5'] < df['SENS_5'].shift(-1)) ,'MAC1']
to identify SENSORS with high-lows one by one.
However there are almost 500 machines and 150 sensors.
So I think I need to iterate over columns by selecting SENSORS after MAC column:
df.iloc[:,2:]
and check for the condition.
However this function will run every 2000ms(2secs) so it has to be done with the fastest way possible.
What is the most efficient way of iterating over N columns for N groups and extract groups that match the condition?
Example data with NaNs:
Date MAC SENS_1 SENS_2 SENS_3 SENS_4 SENS_5
82 2023-06-08 MAC41 79.608346 NaN 66.181849 149.590573 12.135441
83 2023-06-09 MAC41 80.655564 NaN 67.636943 109.034809 12.278676
答案1
得分: 2
你可以使用以下代码:
def count(df):
m = df.iloc[:, 2:].diff().fillna(True).gt(0).all(axis=0)
return pd.Series({'Sensors': ', '.join(m.index[m]), 'Count': m.sum()})
out = df.groupby('MAC', as_index=False).apply(count)
输出:
>>> out
MAC Sensors Count
0 MAC1 SENS_1, SENS_2, SENS_3, SENS_N 4
1 MAC2 SENS_1, SENS_3, SENS_4 3
2 MAC3 SENS_1, SENS_2, SENS_3, SENS_4, SENS_N 5
3 MAC4 SENS_1, SENS_2, SENS_3, SENS_4, SENS_N 5
4 MAC5 SENS_1, SENS_2, SENS_3, SENS_N 4
5 MACN SENS_1, SENS_N 2
或者使用另一种方法(假设每台机器只有今天和昨天的数据):
def count(df):
m = df.iloc[1, 2:] - df.iloc[0, 2:] > 0
return pd.Series({'Sensors': ', '.join(m.index[m]), 'Count': m.sum()})
out = df.groupby('MAC', as_index=False).apply(count)
请注意,这些代码是用于处理数据的示例代码,具体的数据处理流程需要根据你的数据和需求进行调整。
英文:
You can use:
def count(df):
m = df.iloc[:, 2:].diff().fillna(True).gt(0).all(axis=0)
return pd.Series({'Sensors': ', '.join(m.index[m]), 'Count': m.sum()})
out = df.groupby('MAC', as_index=False).apply(count)
Output:
>>> out
MAC Sensors Count
0 MAC1 SENS_1, SENS_2, SENS_3, SENS_N 4
1 MAC2 SENS_1, SENS_3, SENS_4 3
2 MAC3 SENS_1, SENS_2, SENS_3, SENS_4, SENS_N 5
3 MAC4 SENS_1, SENS_2, SENS_3, SENS_4, SENS_N 5
4 MAC5 SENS_1, SENS_2, SENS_3, SENS_N 4
5 MACN SENS_1, SENS_N 2
Alternative (as you have only today and yesterday rows per machine):
def count(df):
m = df.iloc[1, 2:] - df.iloc[0, 2:] > 0
return pd.Series({'Sensors': ', '.join(m.index[m]), 'Count': m.sum()})
out = df.groupby('MAC', as_index=False).apply(count)
答案2
得分: 1
columns = df.columns[2:]
diffs = (
df.groupby('MAC', sort=False)[columns]
.diff().dropna().gt(0)
.where(lambda df: ~df, columns.to_series(), axis=1) # 用列名替换`True`
.replace(False, float('nan'))
)
count = diffs.count(axis=1)
str_concat = diffs.agg(lambda cols: ','.join(cols.dropna()), axis=1)
out = df.loc[count.index, ['MAC']]
out['SENSORS'] = str_concat
out['N. OF SENSORS'] = count
英文:
columns = df.columns[2:]
diffs = (
df.groupby('MAC', sort=False)[columns]
.diff().dropna().gt(0)
.where(lambda df: ~df, columns.to_series(), axis=1) # Replace `True` with column name
.replace(False, float('nan'))
)
count = diffs.count(axis=1)
str_concat = diffs.agg(lambda cols: ','.join(cols.dropna()), axis=1)
out = df.loc[count.index, ['MAC']]
out['SENSORS'] = str_concat
out['N. OF SENSORS'] = count
MAC SENSORS N. OF SENSORS
1 MAC1 SENS_1,SENS_2,SENS_3,SENS_N 4
3 MAC2 SENS_1,SENS_3,SENS_4 3
5 MAC3 SENS_1,SENS_2,SENS_3,SENS_4,SENS_N 5
7 MAC4 SENS_1,SENS_2,SENS_3,SENS_4,SENS_N 5
9 MAC5 SENS_1,SENS_2,SENS_3,SENS_N 4
11 MACN SENS_1,SENS_N 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论