创建一个分类列,该列包括当前时间前1小时和后1小时的计数。

huangapple go评论72阅读模式
英文:

Create count of categorical column 1 hour ahead and 1 hour behind current time

问题

I have a dataframe with two columns: time_of_day and categorical_column. I want to count the number of values that have the same value in the categorical_column that are 1 hour ahead and 1 hour behind the current time:

Example Input:

time_of_day         categorical_column
25/05/2023 11:30:00 category1
25/05/2023 11:30:00 category1
25/05/2023 11:45:00 category1
25/05/2023 12:35:00 category1
25/05/2023 13:00:00 category2
25/05/2023 13:30:00 category1
25/05/2023 13:45:00 category1
25/05/2023 14:00:00 category2
25/05/2023 14:15:00 category2
25/05/2023 14:15:00 category1

Example Output:

time_of_day         categorical_column window_count
25/05/2023 11:30:00 category1         3
25/05/2023 11:30:00 category1         3
25/05/2023 11:45:00 category1         4
25/05/2023 12:35:00 category1         3
25/05/2023 13:00:00 category2         2
25/05/2023 13:30:00 category1         4
25/05/2023 13:45:00 category1         3
25/05/2023 14:00:00 category2         3
25/05/2023 14:15:00 category2         2
25/05/2023 14:15:00 category1         3

例如,time_of_day=25/05/2023 11:45:00categorical_column=category1 的值为 4,因为在范围 25/05/2023 10:45:00 - 25/05/2023 12:45:00 内包含了 4 个值,这些值都具有 category1

英文:

I have a dataframe with two columns: time_of_day and categorical column. I want to count the number of values that have the same value in the categorical_column that are 1 hour ahead and 1 hour behind the current time:

Example Input:

time_of_day	        categorical_column
25/05/2023 11:30:00	category1
25/05/2023 11:30:00	category1
25/05/2023 11:45:00	category1
25/05/2023 12:35:00	category1
25/05/2023 13:00:00	category2
25/05/2023 13:30:00	category1
25/05/2023 13:45:00	category1
25/05/2023 14:00:00	category2
25/05/2023 14:15:00	category2
25/05/2023 14:15:00	category1

Example Output:

time_of_day	        categorical_column	window_count
25/05/2023 11:30:00	category1	        3
25/05/2023 11:30:00	category1	        3
25/05/2023 11:45:00	category1	        4
25/05/2023 12:35:00	category1	        3
25/05/2023 13:00:00	category2	        2
25/05/2023 13:30:00	category1	        4
25/05/2023 13:45:00	category1	        3
25/05/2023 14:00:00	category2	        3
25/05/2023 14:15:00	category2	        2
25/05/2023 14:15:00	category1	        3

e.g. time_of_day=25/05/2023 11:45:00 and categorical_column=category1 has a value of 4 because there are 4 values containing category1 in range 25/05/2023 10:45:00 - 25/05/2023 12:45:00

答案1

得分: 1

我会在groupby.transform中使用[tag:numpy]广播来按组执行操作:

df['time_of_day'] = pd.to_datetime(df['time_of_day'])

def f(s):
    a = s.to_numpy()
    return (abs(a[:,None]-a)<=delta).sum(axis=1)

df['window_count'] = df.groupby('categorical_column')['time_of_day'].transform(f)

输出:

          time_of_day categorical_column  window_count
0 2023-05-25 11:30:00          category1             3
1 2023-05-25 11:30:00          category1             3
2 2023-05-25 11:45:00          category1             4
3 2023-05-25 12:35:00          category1             3
4 2023-05-25 13:00:00          category2             2
5 2023-05-25 13:30:00          category1             4
6 2023-05-25 13:45:00          category1             3
7 2023-05-25 14:00:00          category2             3
8 2023-05-25 14:15:00          category2             2
9 2023-05-25 14:15:00          category1             3

如果还有其他翻译需求,请提出。

英文:

I would use [tag:numpy] broadcasting per group in groupby.transform:

df[&#39;time_of_day&#39;] = pd.to_datetime(df[&#39;time_of_day&#39;])

def f(s):
    a = s.to_numpy()
    return (abs(a[:,None]-a)&lt;=delta).sum(axis=1)

df[&#39;window_count&#39;] = df.groupby(&#39;categorical_column&#39;)[&#39;time_of_day&#39;].transform(f)

Output:

          time_of_day categorical_column  window_count
0 2023-05-25 11:30:00          category1             3
1 2023-05-25 11:30:00          category1             3
2 2023-05-25 11:45:00          category1             4
3 2023-05-25 12:35:00          category1             3
4 2023-05-25 13:00:00          category2             2
5 2023-05-25 13:30:00          category1             4
6 2023-05-25 13:45:00          category1             3
7 2023-05-25 14:00:00          category2             3
8 2023-05-25 14:15:00          category2             2
9 2023-05-25 14:15:00          category1             3

答案2

得分: 0

以下是已翻译的内容:

这是一种使用rolling()center=True的方法:

r = df.assign(window_count=1).groupby('categorical_column').rolling('2H1S', on='time_of_day', center=True)['window_count'].sum()

df.join(r.loc[~r.index.duplicated()], on=['categorical_column', 'time_of_day'])

输出:

              time_of_day categorical_column  window_count
    0 2023-05-25 11:30:00          category1           3.0
    1 2023-05-25 11:30:00          category1           3.0
    2 2023-05-25 11:45:00          category1           4.0
    3 2023-05-25 12:35:00          category1           3.0
    4 2023-05-25 13:00:00          category2           2.0
    5 2023-05-25 13:30:00          category1           4.0
    6 2023-05-25 13:45:00          category1           3.0
    7 2023-05-25 14:00:00          category2           3.0
    8 2023-05-25 14:15:00          category2           2.0
    9 2023-05-25 14:15:00          category1           3.0

请注意,这是代码的翻译部分,不包括问题。

英文:

Here is a way with rolling() with center = True

r = df.assign(window_count = 1).groupby(&#39;categorical_column&#39;).rolling(&#39;2H1S&#39;,on =&#39;time_of_day&#39;,center=True)[&#39;window_count&#39;].sum()

df.join(r.loc[~r.index.duplicated()],on = [&#39;categorical_column&#39;,&#39;time_of_day&#39;])

Output:

          time_of_day categorical_column  window_count
0 2023-05-25 11:30:00          category1           3.0
1 2023-05-25 11:30:00          category1           3.0
2 2023-05-25 11:45:00          category1           4.0
3 2023-05-25 12:35:00          category1           3.0
4 2023-05-25 13:00:00          category2           2.0
5 2023-05-25 13:30:00          category1           4.0
6 2023-05-25 13:45:00          category1           3.0
7 2023-05-25 14:00:00          category2           3.0
8 2023-05-25 14:15:00          category2           2.0
9 2023-05-25 14:15:00          category1           3.0

huangapple
  • 本文由 发表于 2023年5月25日 18:46:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331447.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定