基于滚动窗口中的数值创建新列。

huangapple go评论108阅读模式
英文:

Create new column based on the values in a rolling window

问题

我有一个带有日期时间索引和包含整数的列的 DataFrame(在这个例子中,它只包含 0 和 1):

df = {
"date": pd.date_range(start="2010-01-01 12:00", end="2010-01-01 12:05", freq="T"),
"values": [1, 0, 0, 0, 1, 0]
}


日期 值
0 2010-01-01 12:00:00 1
1 2010-01-01 12:01:00 0
2 2010-01-01 12:02:00 0
3 2010-01-01 12:03:00 0
4 2010-01-01 12:04:00 1
5 2010-01-01 12:05:00 0


我想在 2 分钟的滚动时间窗口中,如果有 1,则返回 True,否则返回 False,如下所示:

日期 值
0 2010-01-01 12:00:00 True - 因为窗口 [1, 0] 包含 1
1 2010-01-01 12:01:00 False - 因为窗口 [0, 0] 不包含 1
2 2010-01-01 12:02:00 False
3 2010-01-01 12:03:00 True
4 2010-01-01 12:04:00 True


我尝试过使用 .groupby(),但进展不大。
英文:

I have a DataFrame with a DateTime index and a column containing integers (in this example it only contains 0 and 1):

df = {
    "date": pd.date_range(start="2010-01-01 12:00", end="2010-01-01 12:05", freq="T"),
    "values": [1, 0, 0, 0, 1, 0]
}
date	                values
0	2010-01-01 12:00:00	1
1	2010-01-01 12:01:00	0
2	2010-01-01 12:02:00	0
3	2010-01-01 12:03:00	0
4	2010-01-01 12:04:00	1
5	2010-01-01 12:05:00	0

I would like to return True if there is a 1 in a rolling time window of 2 minutes, otherwise False, as shown below:


date	                values
0	2010-01-01 12:00:00	True      - because the window [1, 0] contains 1
1	2010-01-01 12:01:00	False     - because the window [0, 0] does not contain 1
2	2010-01-01 12:02:00	False
3	2010-01-01 12:03:00	True
4	2010-01-01 12:04:00	True

I tried a .groupby() but I didn't get too far.

答案1

得分: 1

你可以使用 rolling 函数与日期时间索引:

df['date'] = pd.to_datetime(df['date'])

out = (
 df.set_index('date')[::-1]
   .rolling('2min').max()
   .astype(bool)[::-1].reset_index()
)

或者:

out = (
   df[::-1]
   .rolling('2min', on='date').max()
   .astype({'values': bool})[::-1]
)

输出结果:

                 date  values
0 2010-01-01 12:00:00    True
1 2010-01-01 12:01:00   False
2 2010-01-01 12:02:00   False
3 2010-01-01 12:03:00    True
4 2010-01-01 12:04:00    True
5 2010-01-01 12:05:00   False
英文:

You can use rolling with a datetime index:

df['date'] = pd.to_datetime(df['date'])

out = (
 df.set_index('date')[::-1]
   .rolling('2min').max()
   .astype(bool)[::-1].reset_index()
)

Or:

out = (
   df[::-1]
   .rolling('2min', on='date').max()
   .astype({'values': bool})[::-1]
)

Output:

                 date  values
0 2010-01-01 12:00:00    True
1 2010-01-01 12:01:00   False
2 2010-01-01 12:02:00   False
3 2010-01-01 12:03:00    True
4 2010-01-01 12:04:00    True
5 2010-01-01 12:05:00   False

huangapple
  • 本文由 发表于 2023年3月10日 01:51:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75688326.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定