英文:
Find when value changes one row based on another
问题
我有一个数据框,我想找到其中一列的下一行,该列的值发生变化,基于另一列的检查
df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]
上面的代码产生了结果,这是可以的,但是Col3中的值有时会发生变化
那么,如何使用上面代码的结果来查找Col3何时发生变化?
以下是我的数据框的摘录
| 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
但是我希望能够显示
| 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
| 23-02-03 12:01:27.753000 | 49.8 | 39.8 | 15 |
因为目标是找到两者之间的时间差异
根据第一个答案进行编辑
抱歉,我解释不正确,并且我的示例有点不对。Col2是Col1每行的变化量。请看下面,根据我的Python代码,当Col2不大于0.02时,请忽略变化。
| Datetime | Col1 | Col2 | Col3 |
|----------------------------|-------|------|------|
| 23-02-03 12:01:27.213000 | 10 | 0 | 0 |
| 23-02-03 12:01:27.243000 | 10 | 0 | 0 |
| 23-02-03 12:01:27.313000 | 10 | 0 | 0 |
| 23-02-03 12:01:27.353000 | 10 | 0 | 0 |
| 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
| 23-02-03 12:01:27.453000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.513000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.553000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.613000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.653000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.713000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.753000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:27.813000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:27.853000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:27.913000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:27.953000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:28.013000 | 49.81 | 0.1 | 15 |
| 23-02-03 12:01:28.053000 | 49.81 | 0 | 15 |
| 23-02-03 12:01:28.113000 | 49.82 | 0.1 | 15 |
| 23-02-03 12:01:28.153000 | 49.82 | 0 | 15 |
| 23-02-03 12:01:28.213000 | 59.8 | 9.98 | 15 |
| 23-02-03 12:01:28.253000 | 59.8 | 0 | 15 |
| 23-02-03 12:01:28.313000 | 59.8 | 0 | 15 |
| 23-02-03 12:01:28.353000 | 59.8 | 0 | 25 |
| 23-02-03 12:01:28.423000 | 59.8 | 0 | 25 |
| 23-02-03 12:01:28.453000 | 59.8 | 0 | 25 |
因此,结果将是
| 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
| 23-02-03 12:01:27.753000 | 49.8 | 0 | 15 |
和
| 23-02-03 12:01:28.213000 | 59.8 | 9.98 | 15 |
| 23-02-03 12:01:28.353000 | 59.8 | 0 | 25 |
英文:
I have a dataframe where I want to find the next row in one column where the value changes, based on a check on another column
df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]
The code above produces results, which is ok, but the value in Col3 change sometime later
So , how do I use the result of the code above to search when the Col3 changes?
Below is an excerpt of my dataframe
My python code returns the following
| 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
But I want to be able to show
| 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
| 23-02-03 12:01:27.753000 | 49.8 | 39.8 | 15 |
As the goal is to find the time difference between the two
Edit based on first answer
Sorry, I explained it incorreclty, and had my example a little wrong. col2 is the amount that col1 changes per row. See below, when col2 is not bigger than 0.02 as per my python code then ignore the change.
| Datetime | Col1 | Col2 | Col3 |
|----------------------------|-------|------|------|
| 23-02-03 12:01:27.213000 | 10 | 0 | 0 |
| 23-02-03 12:01:27.243000 | 10 | 0 | 0 |
| 23-02-03 12:01:27.313000 | 10 | 0 | 0 |
| 23-02-03 12:01:27.353000 | 10 | 0 | 0 |
| 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
| 23-02-03 12:01:27.453000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.513000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.553000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.613000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.653000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.713000 | 49.8 | 0 | 0 |
| 23-02-03 12:01:27.753000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:27.813000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:27.853000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:27.913000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:27.953000 | 49.8 | 0 | 15 |
| 23-02-03 12:01:28.013000 | 49.81 | 0.1 | 15 |
| 23-02-03 12:01:28.053000 | 49.81 | 0 | 15 |
| 23-02-03 12:01:28.113000 | 49.82 | 0.1 | 15 |
| 23-02-03 12:01:28.153000 | 49.82 | 0 | 15 |
| 23-02-03 12:01:28.213000 | 59.8 | 9.98 | 15 |
| 23-02-03 12:01:28.253000 | 59.8 | 0 | 15 |
| 23-02-03 12:01:28.313000 | 59.8 | 0 | 15 |
| 23-02-03 12:01:28.353000 | 59.8 | 0 | 25 |
| 23-02-03 12:01:28.423000 | 59.8 | 0 | 25 |
| 23-02-03 12:01:28.453000 | 59.8 | 0 | 25 |
So the result would be
| 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
| 23-02-03 12:01:27.753000 | 49.8 | 0 | 15 |
And
| 23-02-03 12:01:28.213000 | 59.8 | 9.98 | 15 |
| 23-02-03 12:01:28.353000 | 59.8 | 0 | 25 |
答案1
得分: 0
给定你的数据框:
Datetime Col1 Col2 Col3
0 2003-02-23 12:01:27.213 10.00 0.00 0
1 2003-02-23 12:01:27.243 10.00 0.00 0
2 2003-02-23 12:01:27.313 10.00 0.00 0
3 2003-02-23 12:01:27.353 10.00 0.00 0
4 2003-02-23 12:01:27.413 49.80 39.80 0
5 2003-02-23 12:01:27.453 49.80 0.00 0
6 2003-02-23 12:01:27.513 49.80 0.00 0
7 2003-02-23 12:01:27.553 49.80 0.00 0
8 2003-02-23 12:01:27.613 49.80 0.00 0
9 2003-02-23 12:01:27.653 49.80 0.00 0
10 2003-02-23 12:01:27.713 49.80 0.00 0
11 2003-02-23 12:01:27.753 49.80 0.00 15
12 2003-02-23 12:01:27.813 49.80 0.00 15
13 2003-02-23 12:01:27.853 49.80 0.00 15
14 2003-02-23 12:01:27.913 49.80 0.00 15
15 2003-02-23 12:01:27.953 49.80 0.00 15
16 2003-02-23 12:01:28.013 49.81 0.10 15
17 2003-02-23 12:01:28.053 49.81 0.00 15
18 2003-02-23 12:01:28.113 49.82 0.10 15
19 2003-02-23 12:01:28.153 49.82 0.00 15
20 2003-02-23 12:01:28.213 59.80 9.98 15
21 2003-02-23 12:01:28.253 59.80 0.00 15
22 2003-02-23 12:01:28.313 59.80 0.00 15
23 2003-02-23 12:01:28.353 59.80 0.00 25
24 2003-02-23 12:01:28.423 59.80 0.00 25
25 2003-02-23 12:01:28.453 59.80 0.00 25
首先提取那些满足条件 df["Col2"] > 0.02
或者 df["Col3"]
值发生变化的行,即 df["Col3"].diff() != 0
:
df_filt = df[(df["Col2"] > 0.02) | (df["Col3"].diff().fillna(value=0) != 0)]
(我添加了 .fillna(value=0)
来填充第一个元素为零。)
Datetime Col1 Col2 Col3
4 2003-02-23 12:01:27.413 49.80 39.80 0
11 2003-02-23 12:01:27.753 49.80 0.00 15
16 2003-02-23 12:01:28.013 49.81 0.10 15
18 2003-02-23 12:01:28.113 49.82 0.10 15
20 2003-02-23 12:01:28.213 59.80 9.98 15
23 2003-02-23 12:01:28.353 59.80 0.00 25
从这个筛选后的数据框中,我们想选择那些差异非零的行以及前一行:
diff_nz = df_filt["Col3"].diff().fillna(value=0) != 0
result = df[diff_nz | diff_nz.shift(-1)]
这将得到所需的结果:
Datetime Col1 Col2 Col3
4 2003-02-23 12:01:27.413 49.8 39.80 0
11 2003-02-23 12:01:27.753 49.8 0.00 15
20 2003-02-23 12:01:28.213 59.8 9.98 15
23 2003-02-23 12:01:28.353 59.8 0.00 25
由于你想找到包含相同 Col1
的行之间的时间差异:
delta_t = result.groupby("Col1").diff().dropna()
如果你想在这个数据框中还包含 Col1
,可以执行以下操作:
delta_t["Col1"] = result["Col1"]
这样可以工作,因为两个数据框具有相同的索引。最终的结果是:
Datetime Col2 Col3 Col1
11 0 days 00:00:00.340000 -39.80 15.0 49.8
23 0 days 00:00:00.140000 -9.98 10
<details>
<summary>英文:</summary>
Given your dataframe:
```none
Datetime Col1 Col2 Col3
0 2003-02-23 12:01:27.213 10.00 0.00 0
1 2003-02-23 12:01:27.243 10.00 0.00 0
2 2003-02-23 12:01:27.313 10.00 0.00 0
3 2003-02-23 12:01:27.353 10.00 0.00 0
4 2003-02-23 12:01:27.413 49.80 39.80 0
5 2003-02-23 12:01:27.453 49.80 0.00 0
6 2003-02-23 12:01:27.513 49.80 0.00 0
7 2003-02-23 12:01:27.553 49.80 0.00 0
8 2003-02-23 12:01:27.613 49.80 0.00 0
9 2003-02-23 12:01:27.653 49.80 0.00 0
10 2003-02-23 12:01:27.713 49.80 0.00 0
11 2003-02-23 12:01:27.753 49.80 0.00 15
12 2003-02-23 12:01:27.813 49.80 0.00 15
13 2003-02-23 12:01:27.853 49.80 0.00 15
14 2003-02-23 12:01:27.913 49.80 0.00 15
15 2003-02-23 12:01:27.953 49.80 0.00 15
16 2003-02-23 12:01:28.013 49.81 0.10 15
17 2003-02-23 12:01:28.053 49.81 0.00 15
18 2003-02-23 12:01:28.113 49.82 0.10 15
19 2003-02-23 12:01:28.153 49.82 0.00 15
20 2003-02-23 12:01:28.213 59.80 9.98 15
21 2003-02-23 12:01:28.253 59.80 0.00 15
22 2003-02-23 12:01:28.313 59.80 0.00 15
23 2003-02-23 12:01:28.353 59.80 0.00 25
24 2003-02-23 12:01:28.423 59.80 0.00 25
25 2003-02-23 12:01:28.453 59.80 0.00 25
Let's first extract the rows where df["Col2"] > 0.02
or the value of df["Col3"]
has changed, i.e. df["Col3"].diff() != 0
:
df_filt = df[(df["Col2"] > 0.02) | (df["Col3"].diff().fillna(value=0) != 0)]
(I added .fillna(value=0)
to fill the first element with zero.)
Datetime Col1 Col2 Col3
4 2003-02-23 12:01:27.413 49.80 39.80 0
11 2003-02-23 12:01:27.753 49.80 0.00 15
16 2003-02-23 12:01:28.013 49.81 0.10 15
18 2003-02-23 12:01:28.113 49.82 0.10 15
20 2003-02-23 12:01:28.213 59.80 9.98 15
23 2003-02-23 12:01:28.353 59.80 0.00 25
from this filtered dataframe, we want to select only those rows where the diff is nonzero, and the row prior:
diff_nz = df_filt["Col3"].diff().fillna(value=0) != 0
result = df[diff_nz | diff_nz.shift(-1)]
which gives the desired result:
Datetime Col1 Col2 Col3
4 2003-02-23 12:01:27.413 49.8 39.80 0
11 2003-02-23 12:01:27.753 49.8 0.00 15
20 2003-02-23 12:01:28.213 59.8 9.98 15
23 2003-02-23 12:01:28.353 59.8 0.00 25
And since you want to find the difference in time between the rows containing the same Col1
,
delta_t = result.groupby("Col1").diff().dropna()
If you want Col1
back in this dataframe, do:
delta_t["Col1"] = result["Col1"]
This works because both dataframes have the same indices. The final result is:
Datetime Col2 Col3 Col1
11 0 days 00:00:00.340000 -39.80 15.0 49.8
23 0 days 00:00:00.140000 -9.98 10.0 59.8
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论