Pandas DataFrame: 分类数据类型到日期时间

huangapple go评论73阅读模式
英文:

Pandas DataFrame: categorical dtype to datetime

问题

我有一个包含“time_gap”列的数据框,该列具有categoricalDtype

CategoricalDtype(categories=['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                  '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                  '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
                  '0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
                  '0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
                  '0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
                  '0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
                  '0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00'],
, ordered=True)

--> 小时:分钟:秒

我想将其转换为日期时间数据类型(理想情况下去掉“0 days”)。

当我尝试使用df["time_gap"] = pd.to_datetime(df["time_gap"])时,我收到以下错误消息:

TypeError: <class 'pandas._libs.tslibs.timedeltas.Timedelta'> is not convertible to datetime, at position 0

是否有一种简单的方法可以将这个categoricalDtype转换为日期时间?

非常感谢您的反馈。

英文:

I have a df with column "time_gap" which has a categoricalDtype:

CategoricalDtype(categories=[&#39;0 days 00:00:00&#39;, &#39;0 days 01:00:00&#39;, &#39;0 days 02:00:00&#39;,
                  &#39;0 days 03:00:00&#39;, &#39;0 days 04:00:00&#39;, &#39;0 days 05:00:00&#39;,
                  &#39;0 days 06:00:00&#39;, &#39;0 days 07:00:00&#39;, &#39;0 days 08:00:00&#39;,
                  &#39;0 days 09:00:00&#39;, &#39;0 days 10:00:00&#39;, &#39;0 days 11:00:00&#39;,
                  &#39;0 days 12:00:00&#39;, &#39;0 days 13:00:00&#39;, &#39;0 days 14:00:00&#39;,
                  &#39;0 days 15:00:00&#39;, &#39;0 days 16:00:00&#39;, &#39;0 days 17:00:00&#39;,
                  &#39;0 days 18:00:00&#39;, &#39;0 days 19:00:00&#39;, &#39;0 days 20:00:00&#39;,
                  &#39;0 days 21:00:00&#39;, &#39;0 days 22:00:00&#39;, &#39;0 days 23:00:00&#39;],
, ordered=True)

--> hours:minutes:seconds

I would like to convert it to a datetime dtype (and ideally get rid of the "0 days").

When I try using df[&quot;time_gap&quot;] = pd.to_datetime(df[&quot;time_gap&quot;]), I get the following error:

TypeError: &lt;class &#39;pandas._libs.tslibs.timedeltas.Timedelta&#39;&gt; is not convertible to datetime, at position 0

Is there an easy way to convert this categoricalDtype to datetime?

Thank you in advance for your feedbacks.

答案1

得分: 1

以下是翻译好的部分:

这里有两种方法可以从分类列中访问小时”:
import pandas as pd

# 虚拟数据 -->
df = pd.DataFrame({"time_gap": ['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                  '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                  '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
                  '0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
                  '0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
                  '0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
                  '0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
                  '0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00']
})

df["time_gap"] = pd.to_timedelta(df["time_gap"]).astype("category")
# <-- 虚拟数据

# 通过时间差:
df["hour"] = df["time_gap"].astype("timedelta64[ns]").dt.total_seconds()/3600

# 通过日期时间:
df["hour_"] = (pd.Timestamp("2022-01-01") + df["time_gap"].astype("timedelta64[ns]")).dt.hour

print(df)
          time_gap  hour  hour_
0  0 days 00:00:00   0.0      0
1  0 days 01:00:00   1.0      1
2  0 days 02:00:00   2.0      2
3  0 days 03:00:00   3.0      3
4  0 days 04:00:00   4.0      4
5  0 days 05:00:00   5.0      5
...

# 注意,.dt.hour 返回一个整数:
print(df.dtypes)
time_gap    category
hour         float64
hour_          int32
dtype: object
英文:

here're two options how you could access the 'hours' from the categorial column:

import pandas as pd

# dummy data --&gt;
df = pd.DataFrame({&quot;time_gap&quot;:[&#39;0 days 00:00:00&#39;, &#39;0 days 01:00:00&#39;, &#39;0 days 02:00:00&#39;,
                  &#39;0 days 03:00:00&#39;, &#39;0 days 04:00:00&#39;, &#39;0 days 05:00:00&#39;,
                  &#39;0 days 06:00:00&#39;, &#39;0 days 07:00:00&#39;, &#39;0 days 08:00:00&#39;,
                  &#39;0 days 09:00:00&#39;, &#39;0 days 10:00:00&#39;, &#39;0 days 11:00:00&#39;,
                  &#39;0 days 12:00:00&#39;, &#39;0 days 13:00:00&#39;, &#39;0 days 14:00:00&#39;,
                  &#39;0 days 15:00:00&#39;, &#39;0 days 16:00:00&#39;, &#39;0 days 17:00:00&#39;,
                  &#39;0 days 18:00:00&#39;, &#39;0 days 19:00:00&#39;, &#39;0 days 20:00:00&#39;,
                  &#39;0 days 21:00:00&#39;, &#39;0 days 22:00:00&#39;, &#39;0 days 23:00:00&#39;]
})

df[&quot;time_gap&quot;] = pd.to_timedelta(df[&quot;time_gap&quot;]).astype(&quot;category&quot;)
# &lt;-- dummy data

# via timedelta:
df[&quot;hour&quot;] = df[&quot;time_gap&quot;].astype(&quot;timedelta64[ns]&quot;).dt.total_seconds()/3600

# via datetime:
df[&quot;hour_&quot;] = (pd.Timestamp(&quot;2022-01-01&quot;) + df[&quot;time_gap&quot;].astype(&quot;timedelta64[ns]&quot;)).dt.hour
print(df)
          time_gap  hour  hour_
0  0 days 00:00:00   0.0      0
1  0 days 01:00:00   1.0      1
2  0 days 02:00:00   2.0      2
3  0 days 03:00:00   3.0      3
4  0 days 04:00:00   4.0      4
5  0 days 05:00:00   5.0      5
...
# note that .dt.hour gives you an integer:
print(df.dtypes)
time_gap    category
hour         float64
hour_          int32
dtype: object

huangapple
  • 本文由 发表于 2023年5月7日 23:38:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76194854.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定