groupby datetime64[ns]列的均值

huangapple go评论77阅读模式
英文:

groupby mean of datetime64[ns] column

问题

我有一个数据框:

  • user_id 对象
  • local time 日期时间64[ns]
  • value int32

我想:

  • 按用户分组
  • 平均值列: “local time”(只有时间 HH:MM:SS,不是日期)和 “value”
import pandas as pd
import numpy as np

# 设置随机种子以便复现
np.random.seed(123)
# 定义用户数和值数
num_users = 2
num_values = 4

# 生成用户ID
user_ids = ['user{}'.format(i+1) for i in range(num_users)]

# 生成本地时间值
local_time = pd.date_range(start='2023-01-01 00:00:00', periods=num_values, freq='H')

# 生成随机值
values = np.random.randint(1, 11, size=(num_values*num_users))

# 创建数据框
df = pd.DataFrame({ 
    'user_id': np.repeat(user_ids, num_values),
    'local time': np.repeat(local_time, num_users),
    'value': values})


# 计算local time的平均值 - 不是datetime。
print (df)

print("expected_output")


'''
      local time  value 
user1 00:30:00    3.75
user2 02:30:00   5.75
'''
df.groupby('user_id').mean()

期望输出:

我想要时间的平均值(小时、分钟和秒,不包括日期)和值的平均值,按用户分组

      local time  value 
user1 00:30:00    3.75
user2 02:30:00   5.75
英文:

I have a datafrmame:

  • user_id object
  • local time datetime64[ns]
  • value int32
  user_id          local time  value
0   user1 2023-01-01 00:00:00      3
1   user1 2023-01-01 00:00:00      3
2   user1 2023-01-01 01:00:00      7
3   user1 2023-01-01 01:00:00      2
4   user2 2023-01-01 02:00:00      4
5   user2 2023-01-01 02:00:00     10
6   user2 2023-01-01 03:00:00      7
7   user2 2023-01-01 03:00:00      2

I want to:

  • groupby user_id
  • mean of cols: "local time" (only time HH:MM:SS, not datetime) and "value"
import pandas as pd
import numpy as np

# Set the random seed for reproducibility
np.random.seed(123)
# Define the number of users and values
num_users = 2
num_values = 4

# Generate the user IDs
user_ids = ['user{}'.format(i+1) for i in range(num_users)]

# Generate the local time values
local_time = pd.date_range(start='2023-01-01 00:00:00', periods=num_values, freq='H')

# Generate the random values
values = np.random.randint(1, 11, size=(num_values*num_users))

# Create the DataFrame
df = pd.DataFrame({ 
    'user_id': np.repeat(user_ids, num_values),
    'local time': np.repeat(local_time, num_users),
    'value': values})


# calculate the mean of local time TIME - NOT datetime.
print (df)

print("expected_output")


'''
      local time  value 
user1 00:30:00    3.75
user2 02:30:00   5.75
'''
df.groupby('user_id').mean()

Expected Output:

I want the mena of the time (hour minutes and seoncds, not date) and mean of value, groupby user

      local time  value 
user1 00:30:00    3.75
user2 02:30:00   5.75

答案1

得分: 4

mean默认只计算数值。使用numeric_only=False来计算本地时间的均值。

但是你可以使用以下方法:

df.groupby('user_id', as_index=False).mean(numeric_only=False)

  user_id          local time  value
0   user1 2023-01-01 09:36:00    3.8
1   user2 2023-01-03 00:00:00    4.4
2   user3 2023-01-04 14:24:00    5.2

更新

我想要时间的均值,而不是日期时间(只需要小时、分钟、秒)。

mean_time = lambda x: str(pd.to_timedelta(x.dt.time.astype(str)).mean())[7:]

out = (df.groupby('user_id', as_index=False)
         .agg({'local time': mean_time, 'value': 'mean'}))

  user_id local time  value
0   user1   00:30:00   3.75
1   user2   02:30:00   5.75
英文:

mean compute only numeric by default. Use numeric_only=False to compute the mean of local time:

However you can use:

>>> df.groupby('user_id', as_index=False).mean(numeric_only=False)

  user_id          local time  value
0   user1 2023-01-01 09:36:00    3.8
1   user2 2023-01-03 00:00:00    4.4
2   user3 2023-01-04 14:24:00    5.2

Update

> I want the mean of Time, not dateime (so just hour, minute, seconds)

mean_time = lambda x: str(pd.to_timedelta(x.dt.time.astype(str)).mean())[7:]

out = (df.groupby('user_id', as_index=False)
         .agg({'local time': mean_time, 'value': 'mean'}))

  user_id local time  value
0   user1   00:30:00   3.75
1   user2   02:30:00   5.75

答案2

得分: 1

下面是代码中的翻译部分:

data = {'user_id': {0: 'user1', 1: 'user1', 2: 'user1', 3: 'user1', 
                    4: 'user2', 5: 'user2', 6: 'user2', 7: 'user2'}, 
        'local time': {0: '2023-01-01 00:00:00', 1: '2023-01-01 00:00:00', 
                       2: '2023-01-01 01:00:00', 3: '2023-01-01 01:00:00', 
                       4: '2023-01-01 02:00:00', 5: '2023-01-01 02:00:00', 
                       6: '2023-01-01 03:00:00', 7: '2023-01-01 03:00:00'}, 
        'value': {0: 3, 1: 3, 2: 7, 3: 2, 4: 4, 5: 10, 6: 7, 7: 2}}
df = pd.DataFrame(data)
df['local time'] = pd.to_datetime(df['local time'])

out = (df
       .groupby('user_id').mean(numeric_only=False)
       .assign(**{'local time' : lambda x: x['local time'].dt.time}))

希望这有所帮助。如果你需要进一步的信息或翻译,请告诉我。

英文:

Example

data = {'user_id': {0: 'user1', 1: 'user1', 2: 'user1', 3: 'user1', 
                    4: 'user2', 5: 'user2', 6: 'user2', 7: 'user2'}, 
        'local time': {0: '2023-01-01 00:00:00', 1: '2023-01-01 00:00:00', 
                       2: '2023-01-01 01:00:00', 3: '2023-01-01 01:00:00', 
                       4: '2023-01-01 02:00:00', 5: '2023-01-01 02:00:00', 
                       6: '2023-01-01 03:00:00', 7: '2023-01-01 03:00:00'}, 
        'value': {0: 3, 1: 3, 2: 7, 3: 2, 4: 4, 5: 10, 6: 7, 7: 2}}
df = pd.DataFrame(data)
df['local time'] = pd.to_datetime(df['local time'])

Code

out = (df
       .groupby('user_id').mean(numeric_only=False)
       .assign(**{'local time' : lambda x: x['local time'].dt.time}))

out:

        local time	value
user_id		
user1	00:30:00	3.75
user2	02:30:00	5.75

out.dtypes

local time     object
value         float64
dtype: object

huangapple
  • 本文由 发表于 2023年5月11日 16:01:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225350.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定