英文:
To put numbers in a class column, separating data frames by date
问题
以下是翻译好的部分:
有一个数据框,其中包含一个模型和日期列,如下所示:
df = pd.DataFrame({'model':['A','B','C','D', 'E','F','G','I','J','K'],
'date':['2022-10-28 12:10:28 AM','2022-12-07 12:12:07 AM','2022-12-07 12:12:07 AM','2022-12-07 12:12:07 AM',
'2022-12-08 12:12:08 AM','2022-12-10 12:12:10 AM','2023-02-22 12:02:22 AM','2023-02-22 12:02:22 AM',
'2023-02-24 12:02:24 AM','2023-03-04 12:03:04 AM']})
我想要区分每个月的1号和15号以及16号和31号(或30号),并将数字放入一个类别列,如下所示。是否可能?
如果您有任何其他问题,请随时询问。
英文:
It has one data frame with a column of model and date like below
df = pd.DataFrame({'model':['A','B','C','D', 'E','F','G','I','J','K'],
'date':['2022-10-28 12:10:28 AM','2022-12-07 12:12:07 AM','2022-12-07 12:12:07 AM','2022-12-07 12:12:07 AM',
'2022-12-08 12:12:08 AM','2022-12-10 12:12:10 AM','2023-02-22 12:02:22 AM','2023-02-22 12:02:22 AM',
'2023-02-24 12:02:24 AM','2023-03-04 12:03:04 AM']})
I want to distinguish between the 1st and 15th of each month and the 16th and 31st(or 30th) of each month
and put numbers in a class column like below
Is it possible?
答案1
得分: 3
你可以使用 pd.cut
函数:
# 找到包含你的日期的起始和结束日期
start = df['date'].min().date() - pd.offsets.MonthBegin(1)
end = df['date'].max().date() + pd.offsets.MonthEnd()
# 创建范围和分 bin 的数值
bins = pd.date_range(start, end, freq='MS')
bins = sorted(bins.tolist() + list(bins + pd.DateOffset(days=15)))
df['class'] = pd.factorize(pd.cut(df['date'], bins=bins, labels=False))[0]
print(df)
# 输出
model date class
0 A 2022-10-28 00:10:28 0
1 B 2022-12-07 00:12:07 1
2 C 2022-12-07 00:12:07 1
3 D 2022-12-07 00:12:07 1
4 E 2022-12-08 00:12:08 1
5 F 2022-12-10 00:12:10 1
6 G 2023-02-22 00:02:22 2
7 I 2023-02-22 00:02:22 2
8 J 2023-02-24 00:02:24 2
9 K 2023-03-04 00:03:04 3
英文:
You can use pd.cut
:
# Find begin and end dates that enclose your dates
start = df['date'].min().date() - pd.offsets.MonthBegin(1)
end = df['date'].max().date() + pd.offsets.MonthEnd()
# Create the range and bin values
bins = pd.date_range(start, end, freq='MS')
bins = sorted(bins.tolist() + list(bins + pd.DateOffset(days=15)))
df['class'] = pd.factorize(pd.cut(df['date'], bins=bins, labels=False))[0]
print(df)
# Output
model date class
0 A 2022-10-28 00:10:28 0
1 B 2022-12-07 00:12:07 1
2 C 2022-12-07 00:12:07 1
3 D 2022-12-07 00:12:07 1
4 E 2022-12-08 00:12:08 1
5 F 2022-12-10 00:12:10 1
6 G 2023-02-22 00:02:22 2
7 I 2023-02-22 00:02:22 2
8 J 2023-02-24 00:02:24 2
9 K 2023-03-04 00:03:04 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论