2023年5月25日 18:46:22go评论72阅读模式

英文:

Create count of categorical column 1 hour ahead and 1 hour behind current time

问题

I have a dataframe with two columns: time_of_day and categorical_column. I want to count the number of values that have the same value in the categorical_column that are 1 hour ahead and 1 hour behind the current time:

Example Input:

time_of_day         categorical_column
25/05/2023 11:30:00 category1
25/05/2023 11:30:00 category1
25/05/2023 11:45:00 category1
25/05/2023 12:35:00 category1
25/05/2023 13:00:00 category2
25/05/2023 13:30:00 category1
25/05/2023 13:45:00 category1
25/05/2023 14:00:00 category2
25/05/2023 14:15:00 category2
25/05/2023 14:15:00 category1

Example Output:

time_of_day         categorical_column window_count
25/05/2023 11:30:00 category1         3
25/05/2023 11:30:00 category1         3
25/05/2023 11:45:00 category1         4
25/05/2023 12:35:00 category1         3
25/05/2023 13:00:00 category2         2
25/05/2023 13:30:00 category1         4
25/05/2023 13:45:00 category1         3
25/05/2023 14:00:00 category2         3
25/05/2023 14:15:00 category2         2
25/05/2023 14:15:00 category1         3

例如，time_of_day=25/05/2023 11:45:00 和 categorical_column=category1 的值为 4，因为在范围 25/05/2023 10:45:00 - 25/05/2023 12:45:00 内包含了 4 个值，这些值都具有 category1。

英文:

I have a dataframe with two columns: time_of_day and categorical column. I want to count the number of values that have the same value in the categorical_column that are 1 hour ahead and 1 hour behind the current time:

Example Input:

time_of_day	        categorical_column
25/05/2023 11:30:00	category1
25/05/2023 11:30:00	category1
25/05/2023 11:45:00	category1
25/05/2023 12:35:00	category1
25/05/2023 13:00:00	category2
25/05/2023 13:30:00	category1
25/05/2023 13:45:00	category1
25/05/2023 14:00:00	category2
25/05/2023 14:15:00	category2
25/05/2023 14:15:00	category1

Example Output:

time_of_day	        categorical_column	window_count
25/05/2023 11:30:00	category1	        3
25/05/2023 11:30:00	category1	        3
25/05/2023 11:45:00	category1	        4
25/05/2023 12:35:00	category1	        3
25/05/2023 13:00:00	category2	        2
25/05/2023 13:30:00	category1	        4
25/05/2023 13:45:00	category1	        3
25/05/2023 14:00:00	category2	        3
25/05/2023 14:15:00	category2	        2
25/05/2023 14:15:00	category1	        3

e.g. time_of_day=25/05/2023 11:45:00 and categorical_column=category1 has a value of 4 because there are 4 values containing category1 in range 25/05/2023 10:45:00 - 25/05/2023 12:45:00

答案1

得分: 1

我会在groupby.transform中使用[tag:numpy]广播来按组执行操作：

df['time_of_day'] = pd.to_datetime(df['time_of_day'])

def f(s):
    a = s.to_numpy()
    return (abs(a[:,None]-a)<=delta).sum(axis=1)

df['window_count'] = df.groupby('categorical_column')['time_of_day'].transform(f)

输出：

          time_of_day categorical_column  window_count
0 2023-05-25 11:30:00          category1             3
1 2023-05-25 11:30:00          category1             3
2 2023-05-25 11:45:00          category1             4
3 2023-05-25 12:35:00          category1             3
4 2023-05-25 13:00:00          category2             2
5 2023-05-25 13:30:00          category1             4
6 2023-05-25 13:45:00          category1             3
7 2023-05-25 14:00:00          category2             3
8 2023-05-25 14:15:00          category2             2
9 2023-05-25 14:15:00          category1             3

如果还有其他翻译需求，请提出。

英文:

I would use [tag:numpy] broadcasting per group in groupby.transform:

df[&#39;time_of_day&#39;] = pd.to_datetime(df[&#39;time_of_day&#39;])

def f(s):
    a = s.to_numpy()
    return (abs(a[:,None]-a)&lt;=delta).sum(axis=1)

df[&#39;window_count&#39;] = df.groupby(&#39;categorical_column&#39;)[&#39;time_of_day&#39;].transform(f)

Output:

          time_of_day categorical_column  window_count
0 2023-05-25 11:30:00          category1             3
1 2023-05-25 11:30:00          category1             3
2 2023-05-25 11:45:00          category1             4
3 2023-05-25 12:35:00          category1             3
4 2023-05-25 13:00:00          category2             2
5 2023-05-25 13:30:00          category1             4
6 2023-05-25 13:45:00          category1             3
7 2023-05-25 14:00:00          category2             3
8 2023-05-25 14:15:00          category2             2
9 2023-05-25 14:15:00          category1             3

答案2

得分: 0

以下是已翻译的内容：

这是一种使用rolling()和center=True的方法：

r = df.assign(window_count=1).groupby('categorical_column').rolling('2H1S', on='time_of_day', center=True)['window_count'].sum()

df.join(r.loc[~r.index.duplicated()], on=['categorical_column', 'time_of_day'])

输出：

              time_of_day categorical_column  window_count
    0 2023-05-25 11:30:00          category1           3.0
    1 2023-05-25 11:30:00          category1           3.0
    2 2023-05-25 11:45:00          category1           4.0
    3 2023-05-25 12:35:00          category1           3.0
    4 2023-05-25 13:00:00          category2           2.0
    5 2023-05-25 13:30:00          category1           4.0
    6 2023-05-25 13:45:00          category1           3.0
    7 2023-05-25 14:00:00          category2           3.0
    8 2023-05-25 14:15:00          category2           2.0
    9 2023-05-25 14:15:00          category1           3.0

请注意，这是代码的翻译部分，不包括问题。

英文:

Here is a way with rolling() with center = True

r = df.assign(window_count = 1).groupby(&#39;categorical_column&#39;).rolling(&#39;2H1S&#39;,on =&#39;time_of_day&#39;,center=True)[&#39;window_count&#39;].sum()

df.join(r.loc[~r.index.duplicated()],on = [&#39;categorical_column&#39;,&#39;time_of_day&#39;])

Output:

          time_of_day categorical_column  window_count
0 2023-05-25 11:30:00          category1           3.0
1 2023-05-25 11:30:00          category1           3.0
2 2023-05-25 11:45:00          category1           4.0
3 2023-05-25 12:35:00          category1           3.0
4 2023-05-25 13:00:00          category2           2.0
5 2023-05-25 13:30:00          category1           4.0
6 2023-05-25 13:45:00          category1           3.0
7 2023-05-25 14:00:00          category2           3.0
8 2023-05-25 14:15:00          category2           2.0
9 2023-05-25 14:15:00          category1           3.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个分类列，该列包括当前时间前1小时和后1小时的计数。

问题

答案1

答案2

Polars arr.to_struct() throws "pyo3_runtime.PanicException: not implemented for dtype Unknown" exception

如何使用Python的Colorama？

使用Scipy.ndimage迭代遍历高程数组。

Handling drop down with selenium , X path method not working?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论