Pandas DataFrame: 分类数据类型到日期时间

huangapple go评论91阅读模式
英文:

Pandas DataFrame: categorical dtype to datetime

问题

我有一个包含“time_gap”列的数据框,该列具有categoricalDtype

  1. CategoricalDtype(categories=['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
  2. '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
  3. '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
  4. '0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
  5. '0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
  6. '0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
  7. '0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
  8. '0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00'],
  9. , ordered=True)

--> 小时:分钟:秒

我想将其转换为日期时间数据类型(理想情况下去掉“0 days”)。

当我尝试使用df["time_gap"] = pd.to_datetime(df["time_gap"])时,我收到以下错误消息:

  1. TypeError: <class 'pandas._libs.tslibs.timedeltas.Timedelta'> is not convertible to datetime, at position 0

是否有一种简单的方法可以将这个categoricalDtype转换为日期时间?

非常感谢您的反馈。

英文:

I have a df with column "time_gap" which has a categoricalDtype:

  1. CategoricalDtype(categories=[&#39;0 days 00:00:00&#39;, &#39;0 days 01:00:00&#39;, &#39;0 days 02:00:00&#39;,
  2. &#39;0 days 03:00:00&#39;, &#39;0 days 04:00:00&#39;, &#39;0 days 05:00:00&#39;,
  3. &#39;0 days 06:00:00&#39;, &#39;0 days 07:00:00&#39;, &#39;0 days 08:00:00&#39;,
  4. &#39;0 days 09:00:00&#39;, &#39;0 days 10:00:00&#39;, &#39;0 days 11:00:00&#39;,
  5. &#39;0 days 12:00:00&#39;, &#39;0 days 13:00:00&#39;, &#39;0 days 14:00:00&#39;,
  6. &#39;0 days 15:00:00&#39;, &#39;0 days 16:00:00&#39;, &#39;0 days 17:00:00&#39;,
  7. &#39;0 days 18:00:00&#39;, &#39;0 days 19:00:00&#39;, &#39;0 days 20:00:00&#39;,
  8. &#39;0 days 21:00:00&#39;, &#39;0 days 22:00:00&#39;, &#39;0 days 23:00:00&#39;],
  9. , ordered=True)

--> hours:minutes:seconds

I would like to convert it to a datetime dtype (and ideally get rid of the "0 days").

When I try using df[&quot;time_gap&quot;] = pd.to_datetime(df[&quot;time_gap&quot;]), I get the following error:

  1. TypeError: &lt;class &#39;pandas._libs.tslibs.timedeltas.Timedelta&#39;&gt; is not convertible to datetime, at position 0

Is there an easy way to convert this categoricalDtype to datetime?

Thank you in advance for your feedbacks.

答案1

得分: 1

以下是翻译好的部分:

  1. 这里有两种方法可以从分类列中访问小时”:
  2. import pandas as pd
  3. # 虚拟数据 -->
  4. df = pd.DataFrame({"time_gap": ['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
  5. '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
  6. '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
  7. '0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
  8. '0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
  9. '0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
  10. '0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
  11. '0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00']
  12. })
  13. df["time_gap"] = pd.to_timedelta(df["time_gap"]).astype("category")
  14. # <-- 虚拟数据
  15. # 通过时间差:
  16. df["hour"] = df["time_gap"].astype("timedelta64[ns]").dt.total_seconds()/3600
  17. # 通过日期时间:
  18. df["hour_"] = (pd.Timestamp("2022-01-01") + df["time_gap"].astype("timedelta64[ns]")).dt.hour
  19. print(df)
  20. time_gap hour hour_
  21. 0 0 days 00:00:00 0.0 0
  22. 1 0 days 01:00:00 1.0 1
  23. 2 0 days 02:00:00 2.0 2
  24. 3 0 days 03:00:00 3.0 3
  25. 4 0 days 04:00:00 4.0 4
  26. 5 0 days 05:00:00 5.0 5
  27. ...
  28. # 注意,.dt.hour 返回一个整数:
  29. print(df.dtypes)
  30. time_gap category
  31. hour float64
  32. hour_ int32
  33. dtype: object
英文:

here're two options how you could access the 'hours' from the categorial column:

  1. import pandas as pd
  2. # dummy data --&gt;
  3. df = pd.DataFrame({&quot;time_gap&quot;:[&#39;0 days 00:00:00&#39;, &#39;0 days 01:00:00&#39;, &#39;0 days 02:00:00&#39;,
  4. &#39;0 days 03:00:00&#39;, &#39;0 days 04:00:00&#39;, &#39;0 days 05:00:00&#39;,
  5. &#39;0 days 06:00:00&#39;, &#39;0 days 07:00:00&#39;, &#39;0 days 08:00:00&#39;,
  6. &#39;0 days 09:00:00&#39;, &#39;0 days 10:00:00&#39;, &#39;0 days 11:00:00&#39;,
  7. &#39;0 days 12:00:00&#39;, &#39;0 days 13:00:00&#39;, &#39;0 days 14:00:00&#39;,
  8. &#39;0 days 15:00:00&#39;, &#39;0 days 16:00:00&#39;, &#39;0 days 17:00:00&#39;,
  9. &#39;0 days 18:00:00&#39;, &#39;0 days 19:00:00&#39;, &#39;0 days 20:00:00&#39;,
  10. &#39;0 days 21:00:00&#39;, &#39;0 days 22:00:00&#39;, &#39;0 days 23:00:00&#39;]
  11. })
  12. df[&quot;time_gap&quot;] = pd.to_timedelta(df[&quot;time_gap&quot;]).astype(&quot;category&quot;)
  13. # &lt;-- dummy data
  14. # via timedelta:
  15. df[&quot;hour&quot;] = df[&quot;time_gap&quot;].astype(&quot;timedelta64[ns]&quot;).dt.total_seconds()/3600
  16. # via datetime:
  17. df[&quot;hour_&quot;] = (pd.Timestamp(&quot;2022-01-01&quot;) + df[&quot;time_gap&quot;].astype(&quot;timedelta64[ns]&quot;)).dt.hour
  1. print(df)
  2. time_gap hour hour_
  3. 0 0 days 00:00:00 0.0 0
  4. 1 0 days 01:00:00 1.0 1
  5. 2 0 days 02:00:00 2.0 2
  6. 3 0 days 03:00:00 3.0 3
  7. 4 0 days 04:00:00 4.0 4
  8. 5 0 days 05:00:00 5.0 5
  9. ...
  1. # note that .dt.hour gives you an integer:
  2. print(df.dtypes)
  3. time_gap category
  4. hour float64
  5. hour_ int32
  6. dtype: object

huangapple
  • 本文由 发表于 2023年5月7日 23:38:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76194854.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定