英文:
Create count of categorical column 1 hour ahead and 1 hour behind current time
问题
I have a dataframe with two columns: time_of_day
and categorical_column
. I want to count the number of values that have the same value in the categorical_column
that are 1 hour ahead and 1 hour behind the current time:
Example Input:
time_of_day categorical_column
25/05/2023 11:30:00 category1
25/05/2023 11:30:00 category1
25/05/2023 11:45:00 category1
25/05/2023 12:35:00 category1
25/05/2023 13:00:00 category2
25/05/2023 13:30:00 category1
25/05/2023 13:45:00 category1
25/05/2023 14:00:00 category2
25/05/2023 14:15:00 category2
25/05/2023 14:15:00 category1
Example Output:
time_of_day categorical_column window_count
25/05/2023 11:30:00 category1 3
25/05/2023 11:30:00 category1 3
25/05/2023 11:45:00 category1 4
25/05/2023 12:35:00 category1 3
25/05/2023 13:00:00 category2 2
25/05/2023 13:30:00 category1 4
25/05/2023 13:45:00 category1 3
25/05/2023 14:00:00 category2 3
25/05/2023 14:15:00 category2 2
25/05/2023 14:15:00 category1 3
例如,time_of_day=25/05/2023 11:45:00
和 categorical_column=category1
的值为 4
,因为在范围 25/05/2023 10:45:00 - 25/05/2023 12:45:00
内包含了 4 个值,这些值都具有 category1
。
英文:
I have a dataframe with two columns: time_of_day
and categorical column
. I want to count the number of values that have the same value in the categorical_column
that are 1 hour ahead and 1 hour behind the current time:
Example Input:
time_of_day categorical_column
25/05/2023 11:30:00 category1
25/05/2023 11:30:00 category1
25/05/2023 11:45:00 category1
25/05/2023 12:35:00 category1
25/05/2023 13:00:00 category2
25/05/2023 13:30:00 category1
25/05/2023 13:45:00 category1
25/05/2023 14:00:00 category2
25/05/2023 14:15:00 category2
25/05/2023 14:15:00 category1
Example Output:
time_of_day categorical_column window_count
25/05/2023 11:30:00 category1 3
25/05/2023 11:30:00 category1 3
25/05/2023 11:45:00 category1 4
25/05/2023 12:35:00 category1 3
25/05/2023 13:00:00 category2 2
25/05/2023 13:30:00 category1 4
25/05/2023 13:45:00 category1 3
25/05/2023 14:00:00 category2 3
25/05/2023 14:15:00 category2 2
25/05/2023 14:15:00 category1 3
e.g. time_of_day=25/05/2023 11:45:00
and categorical_column=category1
has a value of 4
because there are 4 values containing category1
in range 25/05/2023 10:45:00 - 25/05/2023 12:45:00
答案1
得分: 1
我会在groupby.transform
中使用[tag:numpy]广播来按组执行操作:
df['time_of_day'] = pd.to_datetime(df['time_of_day'])
def f(s):
a = s.to_numpy()
return (abs(a[:,None]-a)<=delta).sum(axis=1)
df['window_count'] = df.groupby('categorical_column')['time_of_day'].transform(f)
输出:
time_of_day categorical_column window_count
0 2023-05-25 11:30:00 category1 3
1 2023-05-25 11:30:00 category1 3
2 2023-05-25 11:45:00 category1 4
3 2023-05-25 12:35:00 category1 3
4 2023-05-25 13:00:00 category2 2
5 2023-05-25 13:30:00 category1 4
6 2023-05-25 13:45:00 category1 3
7 2023-05-25 14:00:00 category2 3
8 2023-05-25 14:15:00 category2 2
9 2023-05-25 14:15:00 category1 3
如果还有其他翻译需求,请提出。
英文:
I would use [tag:numpy] broadcasting per group in groupby.transform
:
df['time_of_day'] = pd.to_datetime(df['time_of_day'])
def f(s):
a = s.to_numpy()
return (abs(a[:,None]-a)<=delta).sum(axis=1)
df['window_count'] = df.groupby('categorical_column')['time_of_day'].transform(f)
Output:
time_of_day categorical_column window_count
0 2023-05-25 11:30:00 category1 3
1 2023-05-25 11:30:00 category1 3
2 2023-05-25 11:45:00 category1 4
3 2023-05-25 12:35:00 category1 3
4 2023-05-25 13:00:00 category2 2
5 2023-05-25 13:30:00 category1 4
6 2023-05-25 13:45:00 category1 3
7 2023-05-25 14:00:00 category2 3
8 2023-05-25 14:15:00 category2 2
9 2023-05-25 14:15:00 category1 3
答案2
得分: 0
以下是已翻译的内容:
这是一种使用rolling()
和center=True
的方法:
r = df.assign(window_count=1).groupby('categorical_column').rolling('2H1S', on='time_of_day', center=True)['window_count'].sum()
df.join(r.loc[~r.index.duplicated()], on=['categorical_column', 'time_of_day'])
输出:
time_of_day categorical_column window_count
0 2023-05-25 11:30:00 category1 3.0
1 2023-05-25 11:30:00 category1 3.0
2 2023-05-25 11:45:00 category1 4.0
3 2023-05-25 12:35:00 category1 3.0
4 2023-05-25 13:00:00 category2 2.0
5 2023-05-25 13:30:00 category1 4.0
6 2023-05-25 13:45:00 category1 3.0
7 2023-05-25 14:00:00 category2 3.0
8 2023-05-25 14:15:00 category2 2.0
9 2023-05-25 14:15:00 category1 3.0
请注意,这是代码的翻译部分,不包括问题。
英文:
Here is a way with rolling()
with center = True
r = df.assign(window_count = 1).groupby('categorical_column').rolling('2H1S',on ='time_of_day',center=True)['window_count'].sum()
df.join(r.loc[~r.index.duplicated()],on = ['categorical_column','time_of_day'])
Output:
time_of_day categorical_column window_count
0 2023-05-25 11:30:00 category1 3.0
1 2023-05-25 11:30:00 category1 3.0
2 2023-05-25 11:45:00 category1 4.0
3 2023-05-25 12:35:00 category1 3.0
4 2023-05-25 13:00:00 category2 2.0
5 2023-05-25 13:30:00 category1 4.0
6 2023-05-25 13:45:00 category1 3.0
7 2023-05-25 14:00:00 category2 3.0
8 2023-05-25 14:15:00 category2 2.0
9 2023-05-25 14:15:00 category1 3.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论