2023年3月3日 22:54:08go评论91阅读模式

英文:

Find when value changes one row based on another

问题

我有一个数据框，我想找到其中一列的下一行，该列的值发生变化，基于另一列的检查

df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]

上面的代码产生了结果，这是可以的，但是Col3中的值有时会发生变化

那么，如何使用上面代码的结果来查找Col3何时发生变化？

以下是我的数据框的摘录

| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |

但是我希望能够显示

| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8 | 39.8 | 15   |

因为目标是找到两者之间的时间差异

根据第一个答案进行编辑

抱歉，我解释不正确，并且我的示例有点不对。Col2是Col1每行的变化量。请看下面，根据我的Python代码，当Col2不大于0.02时，请忽略变化。

| Datetime                   | Col1  | Col2 | Col3 |
|----------------------------|-------|------|------|
|  23-02-03 12:01:27.213000  | 10    | 0    | 0    |
| 23-02-03   12:01:27.243000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.313000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.353000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.453000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.513000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.553000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.613000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.653000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.713000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.813000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.853000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.913000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.953000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:28.013000 | 49.81 | 0.1  | 15   |
| 23-02-03   12:01:28.053000 | 49.81 | 0    | 15   |
| 23-02-03   12:01:28.113000 | 49.82 | 0.1  | 15   |
| 23-02-03   12:01:28.153000 | 49.82 | 0    | 15   |
| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.253000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.313000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.423000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.453000 | 59.8  | 0    | 25   |

因此，结果将是

| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |

和

| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |

英文:

I have a dataframe where I want to find the next row in one column where the value changes, based on a check on another column

df[(df[&#39;Col1&#39;] == 49.8) &amp; (df[&#39;Col2&#39;] != 0) &amp; (df[&#39;Col2&#39;].abs() &gt; 0.02)]

The code above produces results, which is ok, but the value in Col3 change sometime later

So , how do I use the result of the code above to search when the Col3 changes?

Below is an excerpt of my dataframe

My python code returns the following

| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |

But I want to be able to show

| 23-02-03   12:01:27.413000 | 49.8 | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8 | 39.8 | 15   |

As the goal is to find the time difference between the two

Edit based on first answer

Sorry, I explained it incorreclty, and had my example a little wrong. col2 is the amount that col1 changes per row. See below, when col2 is not bigger than 0.02 as per my python code then ignore the change.

| Datetime                   | Col1  | Col2 | Col3 |
|----------------------------|-------|------|------|
|  23-02-03 12:01:27.213000  | 10    | 0    | 0    |
| 23-02-03   12:01:27.243000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.313000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.353000 | 10    | 0    | 0    |
| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.453000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.513000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.553000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.613000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.653000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.713000 | 49.8  | 0    | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.813000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.853000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.913000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:27.953000 | 49.8  | 0    | 15   |
| 23-02-03   12:01:28.013000 | 49.81 | 0.1  | 15   |
| 23-02-03   12:01:28.053000 | 49.81 | 0    | 15   |
| 23-02-03   12:01:28.113000 | 49.82 | 0.1  | 15   |
| 23-02-03   12:01:28.153000 | 49.82 | 0    | 15   |
| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.253000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.313000 | 59.8  | 0    | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.423000 | 59.8  | 0    | 25   |
| 23-02-03   12:01:28.453000 | 59.8  | 0    | 25   |

So the result would be

| 23-02-03   12:01:27.413000 | 49.8  | 39.8 | 0    |
| 23-02-03   12:01:27.753000 | 49.8  | 0    | 15   |

And

| 23-02-03   12:01:28.213000 | 59.8  | 9.98 | 15   |
| 23-02-03   12:01:28.353000 | 59.8  | 0    | 25   |

答案1

得分: 0

给定你的数据框：

                  Datetime   Col1   Col2  Col3
0  2003-02-23 12:01:27.213  10.00   0.00     0
1  2003-02-23 12:01:27.243  10.00   0.00     0
2  2003-02-23 12:01:27.313  10.00   0.00     0
3  2003-02-23 12:01:27.353  10.00   0.00     0
4  2003-02-23 12:01:27.413  49.80  39.80     0
5  2003-02-23 12:01:27.453  49.80   0.00     0
6  2003-02-23 12:01:27.513  49.80   0.00     0
7  2003-02-23 12:01:27.553  49.80   0.00     0
8  2003-02-23 12:01:27.613  49.80   0.00     0
9  2003-02-23 12:01:27.653  49.80   0.00     0
10 2003-02-23 12:01:27.713  49.80   0.00     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
12 2003-02-23 12:01:27.813  49.80   0.00    15
13 2003-02-23 12:01:27.853  49.80   0.00    15
14 2003-02-23 12:01:27.913  49.80   0.00    15
15 2003-02-23 12:01:27.953  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
17 2003-02-23 12:01:28.053  49.81   0.00    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
19 2003-02-23 12:01:28.153  49.82   0.00    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
21 2003-02-23 12:01:28.253  59.80   0.00    15
22 2003-02-23 12:01:28.313  59.80   0.00    15
23 2003-02-23 12:01:28.353  59.80   0.00    25
24 2003-02-23 12:01:28.423  59.80   0.00    25
25 2003-02-23 12:01:28.453  59.80   0.00    25

首先提取那些满足条件 df["Col2"] > 0.02 或者 df["Col3"] 值发生变化的行，即 df["Col3"].diff() != 0：

df_filt = df[(df["Col2"] > 0.02) | (df["Col3"].diff().fillna(value=0) != 0)]

（我添加了 .fillna(value=0) 来填充第一个元素为零。）

                  Datetime   Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.80  39.80     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
23 2003-02-23 12:01:28.353  59.80   0.00    25

从这个筛选后的数据框中，我们想选择那些差异非零的行以及前一行：

diff_nz = df_filt["Col3"].diff().fillna(value=0) != 0
result = df[diff_nz | diff_nz.shift(-1)]

这将得到所需的结果：

                  Datetime  Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.8  39.80     0
11 2003-02-23 12:01:27.753  49.8   0.00    15
20 2003-02-23 12:01:28.213  59.8   9.98    15
23 2003-02-23 12:01:28.353  59.8   0.00    25

由于你想找到包含相同 Col1 的行之间的时间差异：

delta_t = result.groupby("Col1").diff().dropna()

如果你想在这个数据框中还包含 Col1，可以执行以下操作：

delta_t["Col1"] = result["Col1"]

这样可以工作，因为两个数据框具有相同的索引。最终的结果是：

                 Datetime   Col2  Col3  Col1
11 0 days 00:00:00.340000 -39.80  15.0  49.8
23 0 days 00:00:00.140000  -9.98  10
<details>
<summary>英文:</summary>
Given your dataframe:
```none
Datetime   Col1   Col2  Col3
0  2003-02-23 12:01:27.213  10.00   0.00     0
1  2003-02-23 12:01:27.243  10.00   0.00     0
2  2003-02-23 12:01:27.313  10.00   0.00     0
3  2003-02-23 12:01:27.353  10.00   0.00     0
4  2003-02-23 12:01:27.413  49.80  39.80     0
5  2003-02-23 12:01:27.453  49.80   0.00     0
6  2003-02-23 12:01:27.513  49.80   0.00     0
7  2003-02-23 12:01:27.553  49.80   0.00     0
8  2003-02-23 12:01:27.613  49.80   0.00     0
9  2003-02-23 12:01:27.653  49.80   0.00     0
10 2003-02-23 12:01:27.713  49.80   0.00     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
12 2003-02-23 12:01:27.813  49.80   0.00    15
13 2003-02-23 12:01:27.853  49.80   0.00    15
14 2003-02-23 12:01:27.913  49.80   0.00    15
15 2003-02-23 12:01:27.953  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
17 2003-02-23 12:01:28.053  49.81   0.00    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
19 2003-02-23 12:01:28.153  49.82   0.00    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
21 2003-02-23 12:01:28.253  59.80   0.00    15
22 2003-02-23 12:01:28.313  59.80   0.00    15
23 2003-02-23 12:01:28.353  59.80   0.00    25
24 2003-02-23 12:01:28.423  59.80   0.00    25
25 2003-02-23 12:01:28.453  59.80   0.00    25

Let's first extract the rows where df["Col2"] > 0.02 or the value of df["Col3"] has changed, i.e. df["Col3"].diff() != 0:

df_filt = df[(df[&quot;Col2&quot;] &gt; 0.02) | (df[&quot;Col3&quot;].diff().fillna(value=0) != 0)]

(I added .fillna(value=0) to fill the first element with zero.)

                  Datetime   Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.80  39.80     0
11 2003-02-23 12:01:27.753  49.80   0.00    15
16 2003-02-23 12:01:28.013  49.81   0.10    15
18 2003-02-23 12:01:28.113  49.82   0.10    15
20 2003-02-23 12:01:28.213  59.80   9.98    15
23 2003-02-23 12:01:28.353  59.80   0.00    25

from this filtered dataframe, we want to select only those rows where the diff is nonzero, and the row prior:

diff_nz = df_filt[&quot;Col3&quot;].diff().fillna(value=0) != 0
result = df[diff_nz | diff_nz.shift(-1)]

which gives the desired result:

                  Datetime  Col1   Col2  Col3
4  2003-02-23 12:01:27.413  49.8  39.80     0
11 2003-02-23 12:01:27.753  49.8   0.00    15
20 2003-02-23 12:01:28.213  59.8   9.98    15
23 2003-02-23 12:01:28.353  59.8   0.00    25

And since you want to find the difference in time between the rows containing the same Col1,

delta_t = result.groupby(&quot;Col1&quot;).diff().dropna()

If you want Col1 back in this dataframe, do:

delta_t[&quot;Col1&quot;] = result[&quot;Col1&quot;]

This works because both dataframes have the same indices. The final result is:

                 Datetime   Col2  Col3  Col1
11 0 days 00:00:00.340000 -39.80  15.0  49.8
23 0 days 00:00:00.140000  -9.98  10.0  59.8

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

找到值在一行基于另一行发生变化时。

问题

答案1

移除Python中列表内元组的括号

格式化主题电子邮件

Running setup.py install for tesserocr … error

循环以筛选字典中的NaN值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。