英文:
groupby mean of datetime64[ns] column
问题
我有一个数据框:
- user_id 对象
- local time 日期时间64[ns]
- value int32
我想:
- 按用户分组
- 平均值列: “local time”(只有时间 HH:MM:SS,不是日期)和 “value”
import pandas as pd
import numpy as np
# 设置随机种子以便复现
np.random.seed(123)
# 定义用户数和值数
num_users = 2
num_values = 4
# 生成用户ID
user_ids = ['user{}'.format(i+1) for i in range(num_users)]
# 生成本地时间值
local_time = pd.date_range(start='2023-01-01 00:00:00', periods=num_values, freq='H')
# 生成随机值
values = np.random.randint(1, 11, size=(num_values*num_users))
# 创建数据框
df = pd.DataFrame({
'user_id': np.repeat(user_ids, num_values),
'local time': np.repeat(local_time, num_users),
'value': values})
# 计算local time的平均值 - 不是datetime。
print (df)
print("expected_output")
'''
local time value
user1 00:30:00 3.75
user2 02:30:00 5.75
'''
df.groupby('user_id').mean()
期望输出:
我想要时间的平均值(小时、分钟和秒,不包括日期)和值的平均值,按用户分组
local time value
user1 00:30:00 3.75
user2 02:30:00 5.75
英文:
I have a datafrmame:
- user_id object
- local time datetime64[ns]
- value int32
user_id local time value
0 user1 2023-01-01 00:00:00 3
1 user1 2023-01-01 00:00:00 3
2 user1 2023-01-01 01:00:00 7
3 user1 2023-01-01 01:00:00 2
4 user2 2023-01-01 02:00:00 4
5 user2 2023-01-01 02:00:00 10
6 user2 2023-01-01 03:00:00 7
7 user2 2023-01-01 03:00:00 2
I want to:
- groupby user_id
- mean of cols: "local time" (only time HH:MM:SS, not datetime) and "value"
import pandas as pd
import numpy as np
# Set the random seed for reproducibility
np.random.seed(123)
# Define the number of users and values
num_users = 2
num_values = 4
# Generate the user IDs
user_ids = ['user{}'.format(i+1) for i in range(num_users)]
# Generate the local time values
local_time = pd.date_range(start='2023-01-01 00:00:00', periods=num_values, freq='H')
# Generate the random values
values = np.random.randint(1, 11, size=(num_values*num_users))
# Create the DataFrame
df = pd.DataFrame({
'user_id': np.repeat(user_ids, num_values),
'local time': np.repeat(local_time, num_users),
'value': values})
# calculate the mean of local time TIME - NOT datetime.
print (df)
print("expected_output")
'''
local time value
user1 00:30:00 3.75
user2 02:30:00 5.75
'''
df.groupby('user_id').mean()
Expected Output:
I want the mena of the time (hour minutes and seoncds, not date) and mean of value, groupby user
local time value
user1 00:30:00 3.75
user2 02:30:00 5.75
答案1
得分: 4
mean
默认只计算数值。使用numeric_only=False
来计算本地时间的均值。
但是你可以使用以下方法:
df.groupby('user_id', as_index=False).mean(numeric_only=False)
user_id local time value
0 user1 2023-01-01 09:36:00 3.8
1 user2 2023-01-03 00:00:00 4.4
2 user3 2023-01-04 14:24:00 5.2
更新
我想要时间的均值,而不是日期时间(只需要小时、分钟、秒)。
mean_time = lambda x: str(pd.to_timedelta(x.dt.time.astype(str)).mean())[7:]
out = (df.groupby('user_id', as_index=False)
.agg({'local time': mean_time, 'value': 'mean'}))
user_id local time value
0 user1 00:30:00 3.75
1 user2 02:30:00 5.75
英文:
mean
compute only numeric by default. Use numeric_only=False
to compute the mean of local time:
However you can use:
>>> df.groupby('user_id', as_index=False).mean(numeric_only=False)
user_id local time value
0 user1 2023-01-01 09:36:00 3.8
1 user2 2023-01-03 00:00:00 4.4
2 user3 2023-01-04 14:24:00 5.2
Update
> I want the mean of Time, not dateime (so just hour, minute, seconds)
mean_time = lambda x: str(pd.to_timedelta(x.dt.time.astype(str)).mean())[7:]
out = (df.groupby('user_id', as_index=False)
.agg({'local time': mean_time, 'value': 'mean'}))
user_id local time value
0 user1 00:30:00 3.75
1 user2 02:30:00 5.75
答案2
得分: 1
下面是代码中的翻译部分:
data = {'user_id': {0: 'user1', 1: 'user1', 2: 'user1', 3: 'user1',
4: 'user2', 5: 'user2', 6: 'user2', 7: 'user2'},
'local time': {0: '2023-01-01 00:00:00', 1: '2023-01-01 00:00:00',
2: '2023-01-01 01:00:00', 3: '2023-01-01 01:00:00',
4: '2023-01-01 02:00:00', 5: '2023-01-01 02:00:00',
6: '2023-01-01 03:00:00', 7: '2023-01-01 03:00:00'},
'value': {0: 3, 1: 3, 2: 7, 3: 2, 4: 4, 5: 10, 6: 7, 7: 2}}
df = pd.DataFrame(data)
df['local time'] = pd.to_datetime(df['local time'])
out = (df
.groupby('user_id').mean(numeric_only=False)
.assign(**{'local time' : lambda x: x['local time'].dt.time}))
希望这有所帮助。如果你需要进一步的信息或翻译,请告诉我。
英文:
Example
data = {'user_id': {0: 'user1', 1: 'user1', 2: 'user1', 3: 'user1',
4: 'user2', 5: 'user2', 6: 'user2', 7: 'user2'},
'local time': {0: '2023-01-01 00:00:00', 1: '2023-01-01 00:00:00',
2: '2023-01-01 01:00:00', 3: '2023-01-01 01:00:00',
4: '2023-01-01 02:00:00', 5: '2023-01-01 02:00:00',
6: '2023-01-01 03:00:00', 7: '2023-01-01 03:00:00'},
'value': {0: 3, 1: 3, 2: 7, 3: 2, 4: 4, 5: 10, 6: 7, 7: 2}}
df = pd.DataFrame(data)
df['local time'] = pd.to_datetime(df['local time'])
Code
out = (df
.groupby('user_id').mean(numeric_only=False)
.assign(**{'local time' : lambda x: x['local time'].dt.time}))
out
:
local time value
user_id
user1 00:30:00 3.75
user2 02:30:00 5.75
out.dtypes
local time object
value float64
dtype: object
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论