找到值在一行基于另一行发生变化时。

huangapple go评论91阅读模式
英文:

Find when value changes one row based on another

问题

我有一个数据框,我想找到其中一列的下一行,该列的值发生变化,基于另一列的检查

  1. df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]

上面的代码产生了结果,这是可以的,但是Col3中的值有时会发生变化

那么,如何使用上面代码的结果来查找Col3何时发生变化?

以下是我的数据框的摘录

  1. | 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |

但是我希望能够显示

  1. | 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
  2. | 23-02-03 12:01:27.753000 | 49.8 | 39.8 | 15 |

因为目标是找到两者之间的时间差异

根据第一个答案进行编辑

抱歉,我解释不正确,并且我的示例有点不对。Col2是Col1每行的变化量。请看下面,根据我的Python代码,当Col2不大于0.02时,请忽略变化。

  1. | Datetime | Col1 | Col2 | Col3 |
  2. |----------------------------|-------|------|------|
  3. | 23-02-03 12:01:27.213000 | 10 | 0 | 0 |
  4. | 23-02-03 12:01:27.243000 | 10 | 0 | 0 |
  5. | 23-02-03 12:01:27.313000 | 10 | 0 | 0 |
  6. | 23-02-03 12:01:27.353000 | 10 | 0 | 0 |
  7. | 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
  8. | 23-02-03 12:01:27.453000 | 49.8 | 0 | 0 |
  9. | 23-02-03 12:01:27.513000 | 49.8 | 0 | 0 |
  10. | 23-02-03 12:01:27.553000 | 49.8 | 0 | 0 |
  11. | 23-02-03 12:01:27.613000 | 49.8 | 0 | 0 |
  12. | 23-02-03 12:01:27.653000 | 49.8 | 0 | 0 |
  13. | 23-02-03 12:01:27.713000 | 49.8 | 0 | 0 |
  14. | 23-02-03 12:01:27.753000 | 49.8 | 0 | 15 |
  15. | 23-02-03 12:01:27.813000 | 49.8 | 0 | 15 |
  16. | 23-02-03 12:01:27.853000 | 49.8 | 0 | 15 |
  17. | 23-02-03 12:01:27.913000 | 49.8 | 0 | 15 |
  18. | 23-02-03 12:01:27.953000 | 49.8 | 0 | 15 |
  19. | 23-02-03 12:01:28.013000 | 49.81 | 0.1 | 15 |
  20. | 23-02-03 12:01:28.053000 | 49.81 | 0 | 15 |
  21. | 23-02-03 12:01:28.113000 | 49.82 | 0.1 | 15 |
  22. | 23-02-03 12:01:28.153000 | 49.82 | 0 | 15 |
  23. | 23-02-03 12:01:28.213000 | 59.8 | 9.98 | 15 |
  24. | 23-02-03 12:01:28.253000 | 59.8 | 0 | 15 |
  25. | 23-02-03 12:01:28.313000 | 59.8 | 0 | 15 |
  26. | 23-02-03 12:01:28.353000 | 59.8 | 0 | 25 |
  27. | 23-02-03 12:01:28.423000 | 59.8 | 0 | 25 |
  28. | 23-02-03 12:01:28.453000 | 59.8 | 0 | 25 |

因此,结果将是

  1. | 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
  2. | 23-02-03 12:01:27.753000 | 49.8 | 0 | 15 |

  1. | 23-02-03 12:01:28.213000 | 59.8 | 9.98 | 15 |
  2. | 23-02-03 12:01:28.353000 | 59.8 | 0 | 25 |
英文:

I have a dataframe where I want to find the next row in one column where the value changes, based on a check on another column

  1. df[(df['Col1'] == 49.8) & (df['Col2'] != 0) & (df['Col2'].abs() > 0.02)]

The code above produces results, which is ok, but the value in Col3 change sometime later

So , how do I use the result of the code above to search when the Col3 changes?

Below is an excerpt of my dataframe

My python code returns the following

  1. | 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |

But I want to be able to show

  1. | 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
  2. | 23-02-03 12:01:27.753000 | 49.8 | 39.8 | 15 |

As the goal is to find the time difference between the two

Edit based on first answer

Sorry, I explained it incorreclty, and had my example a little wrong. col2 is the amount that col1 changes per row. See below, when col2 is not bigger than 0.02 as per my python code then ignore the change.

  1. | Datetime | Col1 | Col2 | Col3 |
  2. |----------------------------|-------|------|------|
  3. | 23-02-03 12:01:27.213000 | 10 | 0 | 0 |
  4. | 23-02-03 12:01:27.243000 | 10 | 0 | 0 |
  5. | 23-02-03 12:01:27.313000 | 10 | 0 | 0 |
  6. | 23-02-03 12:01:27.353000 | 10 | 0 | 0 |
  7. | 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
  8. | 23-02-03 12:01:27.453000 | 49.8 | 0 | 0 |
  9. | 23-02-03 12:01:27.513000 | 49.8 | 0 | 0 |
  10. | 23-02-03 12:01:27.553000 | 49.8 | 0 | 0 |
  11. | 23-02-03 12:01:27.613000 | 49.8 | 0 | 0 |
  12. | 23-02-03 12:01:27.653000 | 49.8 | 0 | 0 |
  13. | 23-02-03 12:01:27.713000 | 49.8 | 0 | 0 |
  14. | 23-02-03 12:01:27.753000 | 49.8 | 0 | 15 |
  15. | 23-02-03 12:01:27.813000 | 49.8 | 0 | 15 |
  16. | 23-02-03 12:01:27.853000 | 49.8 | 0 | 15 |
  17. | 23-02-03 12:01:27.913000 | 49.8 | 0 | 15 |
  18. | 23-02-03 12:01:27.953000 | 49.8 | 0 | 15 |
  19. | 23-02-03 12:01:28.013000 | 49.81 | 0.1 | 15 |
  20. | 23-02-03 12:01:28.053000 | 49.81 | 0 | 15 |
  21. | 23-02-03 12:01:28.113000 | 49.82 | 0.1 | 15 |
  22. | 23-02-03 12:01:28.153000 | 49.82 | 0 | 15 |
  23. | 23-02-03 12:01:28.213000 | 59.8 | 9.98 | 15 |
  24. | 23-02-03 12:01:28.253000 | 59.8 | 0 | 15 |
  25. | 23-02-03 12:01:28.313000 | 59.8 | 0 | 15 |
  26. | 23-02-03 12:01:28.353000 | 59.8 | 0 | 25 |
  27. | 23-02-03 12:01:28.423000 | 59.8 | 0 | 25 |
  28. | 23-02-03 12:01:28.453000 | 59.8 | 0 | 25 |

So the result would be

  1. | 23-02-03 12:01:27.413000 | 49.8 | 39.8 | 0 |
  2. | 23-02-03 12:01:27.753000 | 49.8 | 0 | 15 |

And

  1. | 23-02-03 12:01:28.213000 | 59.8 | 9.98 | 15 |
  2. | 23-02-03 12:01:28.353000 | 59.8 | 0 | 25 |

答案1

得分: 0

给定你的数据框:

  1. Datetime Col1 Col2 Col3
  2. 0 2003-02-23 12:01:27.213 10.00 0.00 0
  3. 1 2003-02-23 12:01:27.243 10.00 0.00 0
  4. 2 2003-02-23 12:01:27.313 10.00 0.00 0
  5. 3 2003-02-23 12:01:27.353 10.00 0.00 0
  6. 4 2003-02-23 12:01:27.413 49.80 39.80 0
  7. 5 2003-02-23 12:01:27.453 49.80 0.00 0
  8. 6 2003-02-23 12:01:27.513 49.80 0.00 0
  9. 7 2003-02-23 12:01:27.553 49.80 0.00 0
  10. 8 2003-02-23 12:01:27.613 49.80 0.00 0
  11. 9 2003-02-23 12:01:27.653 49.80 0.00 0
  12. 10 2003-02-23 12:01:27.713 49.80 0.00 0
  13. 11 2003-02-23 12:01:27.753 49.80 0.00 15
  14. 12 2003-02-23 12:01:27.813 49.80 0.00 15
  15. 13 2003-02-23 12:01:27.853 49.80 0.00 15
  16. 14 2003-02-23 12:01:27.913 49.80 0.00 15
  17. 15 2003-02-23 12:01:27.953 49.80 0.00 15
  18. 16 2003-02-23 12:01:28.013 49.81 0.10 15
  19. 17 2003-02-23 12:01:28.053 49.81 0.00 15
  20. 18 2003-02-23 12:01:28.113 49.82 0.10 15
  21. 19 2003-02-23 12:01:28.153 49.82 0.00 15
  22. 20 2003-02-23 12:01:28.213 59.80 9.98 15
  23. 21 2003-02-23 12:01:28.253 59.80 0.00 15
  24. 22 2003-02-23 12:01:28.313 59.80 0.00 15
  25. 23 2003-02-23 12:01:28.353 59.80 0.00 25
  26. 24 2003-02-23 12:01:28.423 59.80 0.00 25
  27. 25 2003-02-23 12:01:28.453 59.80 0.00 25

首先提取那些满足条件 df["Col2"] > 0.02 或者 df["Col3"] 值发生变化的行,即 df["Col3"].diff() != 0

  1. df_filt = df[(df["Col2"] > 0.02) | (df["Col3"].diff().fillna(value=0) != 0)]

(我添加了 .fillna(value=0) 来填充第一个元素为零。)

  1. Datetime Col1 Col2 Col3
  2. 4 2003-02-23 12:01:27.413 49.80 39.80 0
  3. 11 2003-02-23 12:01:27.753 49.80 0.00 15
  4. 16 2003-02-23 12:01:28.013 49.81 0.10 15
  5. 18 2003-02-23 12:01:28.113 49.82 0.10 15
  6. 20 2003-02-23 12:01:28.213 59.80 9.98 15
  7. 23 2003-02-23 12:01:28.353 59.80 0.00 25

从这个筛选后的数据框中,我们想选择那些差异非零的行以及前一行:

  1. diff_nz = df_filt["Col3"].diff().fillna(value=0) != 0
  2. result = df[diff_nz | diff_nz.shift(-1)]

这将得到所需的结果:

  1. Datetime Col1 Col2 Col3
  2. 4 2003-02-23 12:01:27.413 49.8 39.80 0
  3. 11 2003-02-23 12:01:27.753 49.8 0.00 15
  4. 20 2003-02-23 12:01:28.213 59.8 9.98 15
  5. 23 2003-02-23 12:01:28.353 59.8 0.00 25

由于你想找到包含相同 Col1 的行之间的时间差异:

  1. delta_t = result.groupby("Col1").diff().dropna()

如果你想在这个数据框中还包含 Col1,可以执行以下操作:

  1. delta_t["Col1"] = result["Col1"]

这样可以工作,因为两个数据框具有相同的索引。最终的结果是:

  1. Datetime Col2 Col3 Col1
  2. 11 0 days 00:00:00.340000 -39.80 15.0 49.8
  3. 23 0 days 00:00:00.140000 -9.98 10
  4. <details>
  5. <summary>英文:</summary>
  6. Given your dataframe:
  7. ```none
  8. Datetime Col1 Col2 Col3
  9. 0 2003-02-23 12:01:27.213 10.00 0.00 0
  10. 1 2003-02-23 12:01:27.243 10.00 0.00 0
  11. 2 2003-02-23 12:01:27.313 10.00 0.00 0
  12. 3 2003-02-23 12:01:27.353 10.00 0.00 0
  13. 4 2003-02-23 12:01:27.413 49.80 39.80 0
  14. 5 2003-02-23 12:01:27.453 49.80 0.00 0
  15. 6 2003-02-23 12:01:27.513 49.80 0.00 0
  16. 7 2003-02-23 12:01:27.553 49.80 0.00 0
  17. 8 2003-02-23 12:01:27.613 49.80 0.00 0
  18. 9 2003-02-23 12:01:27.653 49.80 0.00 0
  19. 10 2003-02-23 12:01:27.713 49.80 0.00 0
  20. 11 2003-02-23 12:01:27.753 49.80 0.00 15
  21. 12 2003-02-23 12:01:27.813 49.80 0.00 15
  22. 13 2003-02-23 12:01:27.853 49.80 0.00 15
  23. 14 2003-02-23 12:01:27.913 49.80 0.00 15
  24. 15 2003-02-23 12:01:27.953 49.80 0.00 15
  25. 16 2003-02-23 12:01:28.013 49.81 0.10 15
  26. 17 2003-02-23 12:01:28.053 49.81 0.00 15
  27. 18 2003-02-23 12:01:28.113 49.82 0.10 15
  28. 19 2003-02-23 12:01:28.153 49.82 0.00 15
  29. 20 2003-02-23 12:01:28.213 59.80 9.98 15
  30. 21 2003-02-23 12:01:28.253 59.80 0.00 15
  31. 22 2003-02-23 12:01:28.313 59.80 0.00 15
  32. 23 2003-02-23 12:01:28.353 59.80 0.00 25
  33. 24 2003-02-23 12:01:28.423 59.80 0.00 25
  34. 25 2003-02-23 12:01:28.453 59.80 0.00 25

Let's first extract the rows where df[&quot;Col2&quot;] &gt; 0.02 or the value of df[&quot;Col3&quot;] has changed, i.e. df[&quot;Col3&quot;].diff() != 0:

  1. df_filt = df[(df[&quot;Col2&quot;] &gt; 0.02) | (df[&quot;Col3&quot;].diff().fillna(value=0) != 0)]

(I added .fillna(value=0) to fill the first element with zero.)

  1. Datetime Col1 Col2 Col3
  2. 4 2003-02-23 12:01:27.413 49.80 39.80 0
  3. 11 2003-02-23 12:01:27.753 49.80 0.00 15
  4. 16 2003-02-23 12:01:28.013 49.81 0.10 15
  5. 18 2003-02-23 12:01:28.113 49.82 0.10 15
  6. 20 2003-02-23 12:01:28.213 59.80 9.98 15
  7. 23 2003-02-23 12:01:28.353 59.80 0.00 25

from this filtered dataframe, we want to select only those rows where the diff is nonzero, and the row prior:

  1. diff_nz = df_filt[&quot;Col3&quot;].diff().fillna(value=0) != 0
  2. result = df[diff_nz | diff_nz.shift(-1)]

which gives the desired result:

  1. Datetime Col1 Col2 Col3
  2. 4 2003-02-23 12:01:27.413 49.8 39.80 0
  3. 11 2003-02-23 12:01:27.753 49.8 0.00 15
  4. 20 2003-02-23 12:01:28.213 59.8 9.98 15
  5. 23 2003-02-23 12:01:28.353 59.8 0.00 25

And since you want to find the difference in time between the rows containing the same Col1,

  1. delta_t = result.groupby(&quot;Col1&quot;).diff().dropna()

If you want Col1 back in this dataframe, do:

  1. delta_t[&quot;Col1&quot;] = result[&quot;Col1&quot;]

This works because both dataframes have the same indices. The final result is:

  1. Datetime Col2 Col3 Col1
  2. 11 0 days 00:00:00.340000 -39.80 15.0 49.8
  3. 23 0 days 00:00:00.140000 -9.98 10.0 59.8

huangapple
  • 本文由 发表于 2023年3月3日 22:54:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628596.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定