英文:
Pandas DataFrame: categorical dtype to datetime
问题
我有一个包含“time_gap”列的数据框,该列具有categoricalDtype
:
CategoricalDtype(categories=['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
'0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
'0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
'0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
'0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
'0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
'0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00'],
, ordered=True)
--> 小时:分钟:秒
我想将其转换为日期时间数据类型(理想情况下去掉“0 days”)。
当我尝试使用df["time_gap"] = pd.to_datetime(df["time_gap"])
时,我收到以下错误消息:
TypeError: <class 'pandas._libs.tslibs.timedeltas.Timedelta'> is not convertible to datetime, at position 0
是否有一种简单的方法可以将这个categoricalDtype
转换为日期时间?
非常感谢您的反馈。
英文:
I have a df with column "time_gap" which has a categoricalDtype
:
CategoricalDtype(categories=['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
'0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
'0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
'0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
'0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
'0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
'0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00'],
, ordered=True)
--> hours:minutes:seconds
I would like to convert it to a datetime dtype (and ideally get rid of the "0 days").
When I try using df["time_gap"] = pd.to_datetime(df["time_gap"])
, I get the following error:
TypeError: <class 'pandas._libs.tslibs.timedeltas.Timedelta'> is not convertible to datetime, at position 0
Is there an easy way to convert this categoricalDtype to datetime?
Thank you in advance for your feedbacks.
答案1
得分: 1
以下是翻译好的部分:
这里有两种方法可以从分类列中访问“小时”:
import pandas as pd
# 虚拟数据 -->
df = pd.DataFrame({"time_gap": ['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
'0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
'0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
'0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
'0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
'0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
'0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00']
})
df["time_gap"] = pd.to_timedelta(df["time_gap"]).astype("category")
# <-- 虚拟数据
# 通过时间差:
df["hour"] = df["time_gap"].astype("timedelta64[ns]").dt.total_seconds()/3600
# 通过日期时间:
df["hour_"] = (pd.Timestamp("2022-01-01") + df["time_gap"].astype("timedelta64[ns]")).dt.hour
print(df)
time_gap hour hour_
0 0 days 00:00:00 0.0 0
1 0 days 01:00:00 1.0 1
2 0 days 02:00:00 2.0 2
3 0 days 03:00:00 3.0 3
4 0 days 04:00:00 4.0 4
5 0 days 05:00:00 5.0 5
...
# 注意,.dt.hour 返回一个整数:
print(df.dtypes)
time_gap category
hour float64
hour_ int32
dtype: object
英文:
here're two options how you could access the 'hours' from the categorial column:
import pandas as pd
# dummy data -->
df = pd.DataFrame({"time_gap":['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
'0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
'0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
'0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
'0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
'0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
'0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00']
})
df["time_gap"] = pd.to_timedelta(df["time_gap"]).astype("category")
# <-- dummy data
# via timedelta:
df["hour"] = df["time_gap"].astype("timedelta64[ns]").dt.total_seconds()/3600
# via datetime:
df["hour_"] = (pd.Timestamp("2022-01-01") + df["time_gap"].astype("timedelta64[ns]")).dt.hour
print(df)
time_gap hour hour_
0 0 days 00:00:00 0.0 0
1 0 days 01:00:00 1.0 1
2 0 days 02:00:00 2.0 2
3 0 days 03:00:00 3.0 3
4 0 days 04:00:00 4.0 4
5 0 days 05:00:00 5.0 5
...
# note that .dt.hour gives you an integer:
print(df.dtypes)
time_gap category
hour float64
hour_ int32
dtype: object
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论