英文:
Find when value changes one row based on another
问题
我有一个数据框,我想找到其中一列的下一行,该列的值发生变化,基于另一列的检查
df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]
上面的代码产生了结果,这是可以的,但是Col3中的值有时会发生变化
那么,如何使用上面代码的结果来查找Col3何时发生变化?
以下是我的数据框的摘录
| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |
但是我希望能够显示
| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8 | 39.8 | 15   |
因为目标是找到两者之间的时间差异
根据第一个答案进行编辑
抱歉,我解释不正确,并且我的示例有点不对。Col2是Col1每行的变化量。请看下面,根据我的Python代码,当Col2不大于0.02时,请忽略变化。
| Datetime                   | Col1  | Col2 | Col3 |
|----------------------------|-------|------|------|
|  23-02-03 12:01:27.213000  | 10    | 0    | 0    |
| 23-02-03   12:01:27.243000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.313000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.353000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.453000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.513000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.553000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.613000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.653000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.713000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.813000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.853000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.913000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.953000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:28.013000 | 49.81 | 0.1  | 15   |
| 23-02-03   12:01:28.053000 | 49.81 | 0    | 15   |
| 23-02-03   12:01:28.113000 | 49.82 | 0.1  | 15   |
| 23-02-03   12:01:28.153000 | 49.82 | 0    | 15   |
| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.253000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.313000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.423000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.453000 | 59.8  | 0    | 25   |
因此,结果将是
| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |
和
| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
英文:
I have a dataframe where I want to find the next row in one column where the value changes, based on a check on another column
df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]
The code above produces results, which is ok, but the value in Col3 change sometime later
So , how do I use the result of the code above to search when the Col3 changes?
Below is an excerpt of my dataframe
My python code returns the following
| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |
But I want to be able to show
| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8 | 39.8 | 15   |
As the goal is to find the time difference between the two
Edit based on first answer
Sorry, I explained it incorreclty, and had my example a little wrong. col2 is the amount that col1 changes per row. See below, when col2 is not bigger than 0.02 as per my python code then ignore the change.
| Datetime                   | Col1  | Col2 | Col3 |
|----------------------------|-------|------|------|
|  23-02-03 12:01:27.213000  | 10    | 0    | 0    |
| 23-02-03   12:01:27.243000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.313000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.353000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.453000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.513000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.553000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.613000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.653000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.713000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.813000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.853000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.913000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.953000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:28.013000 | 49.81 | 0.1  | 15   |
| 23-02-03   12:01:28.053000 | 49.81 | 0    | 15   |
| 23-02-03   12:01:28.113000 | 49.82 | 0.1  | 15   |
| 23-02-03   12:01:28.153000 | 49.82 | 0    | 15   |
| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.253000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.313000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.423000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.453000 | 59.8  | 0    | 25   |
So the result would be
| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |
And
| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
答案1
得分: 0
给定你的数据框:
                  Datetime   Col1   Col2  Col3
0  2003-02-23 12:01:27.213  10.00   0.00     0
1  2003-02-23 12:01:27.243  10.00   0.00     0
2  2003-02-23 12:01:27.313  10.00   0.00     0
3  2003-02-23 12:01:27.353  10.00   0.00     0
4  2003-02-23 12:01:27.413  49.80  39.80     0
5  2003-02-23 12:01:27.453  49.80   0.00     0
6  2003-02-23 12:01:27.513  49.80   0.00     0
7  2003-02-23 12:01:27.553  49.80   0.00     0
8  2003-02-23 12:01:27.613  49.80   0.00     0
9  2003-02-23 12:01:27.653  49.80   0.00     0
10 2003-02-23 12:01:27.713  49.80   0.00     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
12 2003-02-23 12:01:27.813  49.80   0.00    15
13 2003-02-23 12:01:27.853  49.80   0.00    15
14 2003-02-23 12:01:27.913  49.80   0.00    15
15 2003-02-23 12:01:27.953  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
17 2003-02-23 12:01:28.053  49.81   0.00    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
19 2003-02-23 12:01:28.153  49.82   0.00    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
21 2003-02-23 12:01:28.253  59.80   0.00    15
22 2003-02-23 12:01:28.313  59.80   0.00    15
23 2003-02-23 12:01:28.353  59.80   0.00    25
24 2003-02-23 12:01:28.423  59.80   0.00    25
25 2003-02-23 12:01:28.453  59.80   0.00    25
首先提取那些满足条件 df["Col2"] > 0.02 或者 df["Col3"] 值发生变化的行,即 df["Col3"].diff() != 0:
df_filt = df[(df["Col2"] > 0.02) | (df["Col3"].diff().fillna(value=0) != 0)]
(我添加了 .fillna(value=0) 来填充第一个元素为零。)
                  Datetime   Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.80  39.80     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
23 2003-02-23 12:01:28.353  59.80   0.00    25
从这个筛选后的数据框中,我们想选择那些差异非零的行以及前一行:
diff_nz = df_filt["Col3"].diff().fillna(value=0) != 0
result = df[diff_nz | diff_nz.shift(-1)]
这将得到所需的结果:
                  Datetime  Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.8  39.80     0
11 2003-02-23 12:01:27.753  49.8   0.00    15
20 2003-02-23 12:01:28.213  59.8   9.98    15
23 2003-02-23 12:01:28.353  59.8   0.00    25
由于你想找到包含相同 Col1 的行之间的时间差异:
delta_t = result.groupby("Col1").diff().dropna()
如果你想在这个数据框中还包含 Col1,可以执行以下操作:
delta_t["Col1"] = result["Col1"]
这样可以工作,因为两个数据框具有相同的索引。最终的结果是:
                 Datetime   Col2  Col3  Col1
11 0 days 00:00:00.340000 -39.80  15.0  49.8
23 0 days 00:00:00.140000  -9.98  10
<details>
<summary>英文:</summary>
Given your dataframe:
```none
Datetime   Col1   Col2  Col3
0  2003-02-23 12:01:27.213  10.00   0.00     0
1  2003-02-23 12:01:27.243  10.00   0.00     0
2  2003-02-23 12:01:27.313  10.00   0.00     0
3  2003-02-23 12:01:27.353  10.00   0.00     0
4  2003-02-23 12:01:27.413  49.80  39.80     0
5  2003-02-23 12:01:27.453  49.80   0.00     0
6  2003-02-23 12:01:27.513  49.80   0.00     0
7  2003-02-23 12:01:27.553  49.80   0.00     0
8  2003-02-23 12:01:27.613  49.80   0.00     0
9  2003-02-23 12:01:27.653  49.80   0.00     0
10 2003-02-23 12:01:27.713  49.80   0.00     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
12 2003-02-23 12:01:27.813  49.80   0.00    15
13 2003-02-23 12:01:27.853  49.80   0.00    15
14 2003-02-23 12:01:27.913  49.80   0.00    15
15 2003-02-23 12:01:27.953  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
17 2003-02-23 12:01:28.053  49.81   0.00    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
19 2003-02-23 12:01:28.153  49.82   0.00    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
21 2003-02-23 12:01:28.253  59.80   0.00    15
22 2003-02-23 12:01:28.313  59.80   0.00    15
23 2003-02-23 12:01:28.353  59.80   0.00    25
24 2003-02-23 12:01:28.423  59.80   0.00    25
25 2003-02-23 12:01:28.453  59.80   0.00    25
Let's first extract the rows where df["Col2"] > 0.02 or the value of df["Col3"] has changed, i.e. df["Col3"].diff() != 0:
df_filt = df[(df["Col2"] > 0.02) | (df["Col3"].diff().fillna(value=0) != 0)]
(I added .fillna(value=0) to fill the first element with zero.)
                  Datetime   Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.80  39.80     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
23 2003-02-23 12:01:28.353  59.80   0.00    25
from this filtered dataframe, we want to select only those rows where the diff is nonzero, and the row prior:
diff_nz = df_filt["Col3"].diff().fillna(value=0) != 0
result = df[diff_nz | diff_nz.shift(-1)]
which gives the desired result:
                  Datetime  Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.8  39.80     0
11 2003-02-23 12:01:27.753  49.8   0.00    15
20 2003-02-23 12:01:28.213  59.8   9.98    15
23 2003-02-23 12:01:28.353  59.8   0.00    25
And since you want to find the difference in time between the rows containing the same Col1,
delta_t = result.groupby("Col1").diff().dropna()
If you want Col1 back in this dataframe, do:
delta_t["Col1"] = result["Col1"]
This works because both dataframes have the same indices. The final result is:
                 Datetime   Col2  Col3  Col1
11 0 days 00:00:00.340000 -39.80  15.0  49.8
23 0 days 00:00:00.140000  -9.98  10.0  59.8
				通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论