找到值在一行基于另一行发生变化时。

huangapple go评论46阅读模式
英文:

Find when value changes one row based on another

问题

我有一个数据框,我想找到其中一列的下一行,该列的值发生变化,基于另一列的检查

df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]

上面的代码产生了结果,这是可以的,但是Col3中的值有时会发生变化

那么,如何使用上面代码的结果来查找Col3何时发生变化?

以下是我的数据框的摘录

| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |

但是我希望能够显示

| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8 | 39.8 | 15   |

因为目标是找到两者之间的时间差异

根据第一个答案进行编辑

抱歉,我解释不正确,并且我的示例有点不对。Col2是Col1每行的变化量。请看下面,根据我的Python代码,当Col2不大于0.02时,请忽略变化。

| Datetime                   | Col1  | Col2 | Col3 |
|----------------------------|-------|------|------|
|  23-02-03 12:01:27.213000  | 10    | 0    | 0    |
| 23-02-03   12:01:27.243000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.313000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.353000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.453000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.513000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.553000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.613000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.653000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.713000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.813000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.853000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.913000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.953000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:28.013000 | 49.81 | 0.1  | 15   |
| 23-02-03   12:01:28.053000 | 49.81 | 0    | 15   |
| 23-02-03   12:01:28.113000 | 49.82 | 0.1  | 15   |
| 23-02-03   12:01:28.153000 | 49.82 | 0    | 15   |
| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.253000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.313000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.423000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.453000 | 59.8  | 0    | 25   |

因此,结果将是

| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |

| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
英文:

I have a dataframe where I want to find the next row in one column where the value changes, based on a check on another column

df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]

The code above produces results, which is ok, but the value in Col3 change sometime later

So , how do I use the result of the code above to search when the Col3 changes?

Below is an excerpt of my dataframe


My python code returns the following

| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |

But I want to be able to show

| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8 | 39.8 | 15   |

As the goal is to find the time difference between the two

Edit based on first answer

Sorry, I explained it incorreclty, and had my example a little wrong. col2 is the amount that col1 changes per row. See below, when col2 is not bigger than 0.02 as per my python code then ignore the change.

| Datetime                   | Col1  | Col2 | Col3 |
|----------------------------|-------|------|------|
|  23-02-03 12:01:27.213000  | 10    | 0    | 0    |
| 23-02-03   12:01:27.243000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.313000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.353000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.453000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.513000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.553000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.613000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.653000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.713000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.813000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.853000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.913000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.953000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:28.013000 | 49.81 | 0.1  | 15   |
| 23-02-03   12:01:28.053000 | 49.81 | 0    | 15   |
| 23-02-03   12:01:28.113000 | 49.82 | 0.1  | 15   |
| 23-02-03   12:01:28.153000 | 49.82 | 0    | 15   |
| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.253000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.313000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.423000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.453000 | 59.8  | 0    | 25   |

So the result would be

| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |

And

| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |

答案1

得分: 0

给定你的数据框:

                  Datetime   Col1   Col2  Col3
0  2003-02-23 12:01:27.213  10.00   0.00     0
1  2003-02-23 12:01:27.243  10.00   0.00     0
2  2003-02-23 12:01:27.313  10.00   0.00     0
3  2003-02-23 12:01:27.353  10.00   0.00     0
4  2003-02-23 12:01:27.413  49.80  39.80     0
5  2003-02-23 12:01:27.453  49.80   0.00     0
6  2003-02-23 12:01:27.513  49.80   0.00     0
7  2003-02-23 12:01:27.553  49.80   0.00     0
8  2003-02-23 12:01:27.613  49.80   0.00     0
9  2003-02-23 12:01:27.653  49.80   0.00     0
10 2003-02-23 12:01:27.713  49.80   0.00     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
12 2003-02-23 12:01:27.813  49.80   0.00    15
13 2003-02-23 12:01:27.853  49.80   0.00    15
14 2003-02-23 12:01:27.913  49.80   0.00    15
15 2003-02-23 12:01:27.953  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
17 2003-02-23 12:01:28.053  49.81   0.00    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
19 2003-02-23 12:01:28.153  49.82   0.00    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
21 2003-02-23 12:01:28.253  59.80   0.00    15
22 2003-02-23 12:01:28.313  59.80   0.00    15
23 2003-02-23 12:01:28.353  59.80   0.00    25
24 2003-02-23 12:01:28.423  59.80   0.00    25
25 2003-02-23 12:01:28.453  59.80   0.00    25

首先提取那些满足条件 df["Col2"] > 0.02 或者 df["Col3"] 值发生变化的行,即 df["Col3"].diff() != 0

df_filt = df[(df["Col2"] > 0.02) | (df["Col3"].diff().fillna(value=0) != 0)]

(我添加了 .fillna(value=0) 来填充第一个元素为零。)

                  Datetime   Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.80  39.80     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
23 2003-02-23 12:01:28.353  59.80   0.00    25

从这个筛选后的数据框中,我们想选择那些差异非零的行以及前一行:

diff_nz = df_filt["Col3"].diff().fillna(value=0) != 0
result = df[diff_nz | diff_nz.shift(-1)]

这将得到所需的结果:

                  Datetime  Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.8  39.80     0
11 2003-02-23 12:01:27.753  49.8   0.00    15
20 2003-02-23 12:01:28.213  59.8   9.98    15
23 2003-02-23 12:01:28.353  59.8   0.00    25

由于你想找到包含相同 Col1 的行之间的时间差异:

delta_t = result.groupby("Col1").diff().dropna()

如果你想在这个数据框中还包含 Col1,可以执行以下操作:

delta_t["Col1"] = result["Col1"]

这样可以工作,因为两个数据框具有相同的索引。最终的结果是:

                 Datetime   Col2  Col3  Col1
11 0 days 00:00:00.340000 -39.80  15.0  49.8
23 0 days 00:00:00.140000  -9.98  10
<details>
<summary>英文:</summary>
Given your dataframe:
```none
Datetime   Col1   Col2  Col3
0  2003-02-23 12:01:27.213  10.00   0.00     0
1  2003-02-23 12:01:27.243  10.00   0.00     0
2  2003-02-23 12:01:27.313  10.00   0.00     0
3  2003-02-23 12:01:27.353  10.00   0.00     0
4  2003-02-23 12:01:27.413  49.80  39.80     0
5  2003-02-23 12:01:27.453  49.80   0.00     0
6  2003-02-23 12:01:27.513  49.80   0.00     0
7  2003-02-23 12:01:27.553  49.80   0.00     0
8  2003-02-23 12:01:27.613  49.80   0.00     0
9  2003-02-23 12:01:27.653  49.80   0.00     0
10 2003-02-23 12:01:27.713  49.80   0.00     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
12 2003-02-23 12:01:27.813  49.80   0.00    15
13 2003-02-23 12:01:27.853  49.80   0.00    15
14 2003-02-23 12:01:27.913  49.80   0.00    15
15 2003-02-23 12:01:27.953  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
17 2003-02-23 12:01:28.053  49.81   0.00    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
19 2003-02-23 12:01:28.153  49.82   0.00    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
21 2003-02-23 12:01:28.253  59.80   0.00    15
22 2003-02-23 12:01:28.313  59.80   0.00    15
23 2003-02-23 12:01:28.353  59.80   0.00    25
24 2003-02-23 12:01:28.423  59.80   0.00    25
25 2003-02-23 12:01:28.453  59.80   0.00    25

Let's first extract the rows where df[&quot;Col2&quot;] &gt; 0.02 or the value of df[&quot;Col3&quot;] has changed, i.e. df[&quot;Col3&quot;].diff() != 0:

df_filt = df[(df[&quot;Col2&quot;] &gt; 0.02) | (df[&quot;Col3&quot;].diff().fillna(value=0) != 0)]

(I added .fillna(value=0) to fill the first element with zero.)

                  Datetime   Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.80  39.80     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
23 2003-02-23 12:01:28.353  59.80   0.00    25

from this filtered dataframe, we want to select only those rows where the diff is nonzero, and the row prior:

diff_nz = df_filt[&quot;Col3&quot;].diff().fillna(value=0) != 0
result = df[diff_nz | diff_nz.shift(-1)]

which gives the desired result:

                  Datetime  Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.8  39.80     0
11 2003-02-23 12:01:27.753  49.8   0.00    15
20 2003-02-23 12:01:28.213  59.8   9.98    15
23 2003-02-23 12:01:28.353  59.8   0.00    25

And since you want to find the difference in time between the rows containing the same Col1,

delta_t = result.groupby(&quot;Col1&quot;).diff().dropna()

If you want Col1 back in this dataframe, do:

delta_t[&quot;Col1&quot;] = result[&quot;Col1&quot;]

This works because both dataframes have the same indices. The final result is:

                 Datetime   Col2  Col3  Col1
11 0 days 00:00:00.340000 -39.80  15.0  49.8
23 0 days 00:00:00.140000  -9.98  10.0  59.8

huangapple
  • 本文由 发表于 2023年3月3日 22:54:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628596.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定