将数字放入一个类别列中,按日期分隔数据框。

huangapple go评论74阅读模式
英文:

To put numbers in a class column, separating data frames by date

问题

以下是翻译好的部分:

有一个数据框,其中包含一个模型和日期列,如下所示:

df = pd.DataFrame({'model':['A','B','C','D', 'E','F','G','I','J','K'],
           'date':['2022-10-28  12:10:28 AM','2022-12-07  12:12:07 AM','2022-12-07  12:12:07 AM','2022-12-07  12:12:07 AM',
                   '2022-12-08  12:12:08 AM','2022-12-10  12:12:10 AM','2023-02-22  12:02:22 AM','2023-02-22  12:02:22 AM',
                   '2023-02-24  12:02:24 AM','2023-03-04  12:03:04 AM']})

我想要区分每个月的1号和15号以及16号和31号(或30号),并将数字放入一个类别列,如下所示。是否可能?

如果您有任何其他问题,请随时询问。

英文:

It has one data frame with a column of model and date like below

df = pd.DataFrame({'model':['A','B','C','D', 'E','F','G','I','J','K'],
           'date':['2022-10-28  12:10:28 AM','2022-12-07  12:12:07 AM','2022-12-07  12:12:07 AM','2022-12-07  12:12:07 AM',
                   '2022-12-08  12:12:08 AM','2022-12-10  12:12:10 AM','2023-02-22  12:02:22 AM','2023-02-22  12:02:22 AM',
                   '2023-02-24  12:02:24 AM','2023-03-04  12:03:04 AM']})

将数字放入一个类别列中,按日期分隔数据框。

I want to distinguish between the 1st and 15th of each month and the 16th and 31st(or 30th) of each month
and put numbers in a class column like below
将数字放入一个类别列中,按日期分隔数据框。

Is it possible?

答案1

得分: 3

你可以使用 pd.cut 函数:

# 找到包含你的日期的起始和结束日期
start = df['date'].min().date() - pd.offsets.MonthBegin(1)
end = df['date'].max().date() + pd.offsets.MonthEnd()

# 创建范围和分 bin 的数值
bins = pd.date_range(start, end, freq='MS')
bins = sorted(bins.tolist() + list(bins + pd.DateOffset(days=15)))
df['class'] = pd.factorize(pd.cut(df['date'], bins=bins, labels=False))[0]
print(df)

# 输出
  model                date  class
0     A 2022-10-28 00:10:28      0
1     B 2022-12-07 00:12:07      1
2     C 2022-12-07 00:12:07      1
3     D 2022-12-07 00:12:07      1
4     E 2022-12-08 00:12:08      1
5     F 2022-12-10 00:12:10      1
6     G 2023-02-22 00:02:22      2
7     I 2023-02-22 00:02:22      2
8     J 2023-02-24 00:02:24      2
9     K 2023-03-04 00:03:04      3
英文:

You can use pd.cut:

# Find begin and end dates that enclose your dates
start = df['date'].min().date() - pd.offsets.MonthBegin(1)
end = df['date'].max().date() + pd.offsets.MonthEnd()

# Create the range and bin values
bins = pd.date_range(start, end, freq='MS')
bins = sorted(bins.tolist() + list(bins + pd.DateOffset(days=15)))
df['class'] = pd.factorize(pd.cut(df['date'], bins=bins, labels=False))[0]
print(df)

# Output
  model                date  class
0     A 2022-10-28 00:10:28      0
1     B 2022-12-07 00:12:07      1
2     C 2022-12-07 00:12:07      1
3     D 2022-12-07 00:12:07      1
4     E 2022-12-08 00:12:08      1
5     F 2022-12-10 00:12:10      1
6     G 2023-02-22 00:02:22      2
7     I 2023-02-22 00:02:22      2
8     J 2023-02-24 00:02:24      2
9     K 2023-03-04 00:03:04      3

huangapple
  • 本文由 发表于 2023年5月11日 15:17:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225001.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定