如何在Pyspark中将字符串类型转换为时间戳?

huangapple go评论81阅读模式
英文:

How to convert string type to timestamp in pyspark?

问题

我在努力将基于字符串类型的日期转换为时间戳,如下所示。

我有以下的字符串类型,找到了一些代码可以将其转换为时间戳。我的pyspark代码如下。

但是尽管我尝试了很多次,结果都是空的。

任何帮助将不胜感激。

谢谢。

英文:

I am struggling to convert the string type based into timestamp as below.

  1. +--------------------+
  2. | mydate|
  3. +--------------------+
  4. |26/Feb/2023:13:58:40|
  5. |26/Feb/2023:13:30:33|
  6. |26/Feb/2023:13:52:50|
  7. |26/Feb/2023:13:47:09|
  8. |26/Feb/2023:13:30:33|
  9. |26/Feb/2023:13:14:28|
  10. |26/Feb/2023:13:11:42|
  11. |26/Feb/2023:13:34:03|
  12. |26/Feb/2023:13:50:43|
  13. |26/Feb/2023:13:10:47|
  14. |26/Feb/2023:13:28:09|
  15. |26/Feb/2023:13:30:16|
  16. |26/Feb/2023:13:19:07|
  17. |26/Feb/2023:13:30:24|
  18. |26/Feb/2023:13:30:16|
  19. |26/Feb/2023:13:05:37|
  20. |26/Feb/2023:13:09:24|
  21. |26/Feb/2023:13:24:18|
  22. |26/Feb/2023:13:49:13|
  23. |26/Feb/2023:13:56:40|
  24. +--------------------+

I have the string type as below and I found the some codes that makes it converted to the time stamp. My pyspark code is as below.

  1. wt.select('mydate').show()
  2. wt.select(to_timestamp(lit('mydate'),"dd/MMM/yyyy:HH:mm:ss")).show()

But the results are empty even though I tried many times.

  1. +----------------------------------------------+
  2. |to_timestamp('mydate', 'dd/MMM/yyyy:HH:mm:ss')|
  3. +----------------------------------------------+
  4. | null|
  5. | null|
  6. | null|
  7. | null|
  8. | null|
  9. | null|
  10. | null|
  11. | null|
  12. | null|
  13. | null|
  14. | null|
  15. | null|
  16. | null|
  17. | null|
  18. | null|
  19. | null|
  20. | null|
  21. | null|
  22. | null|
  23. | null|
  24. +----------------------------------------------+

Any help will be appreciated.
Thanks.

答案1

得分: 1

以下是代码部分的翻译:

The code you have is almost correct.
你的代码几乎是正确的。

If you have a dataframe with timestamps in string.
如果你有一个包含字符串格式的时间戳的数据框。

You convert the column of 'strDate' to the given format.
你将名为'strDate'的列转换为指定的格式。

Yields
产生的结果如下

We can verify the datatype with
我们可以使用以下方式验证数据类型

res.dtypes
数据类型如下:

[('to_timestamp(strDate, dd/MMM/yyyy:HH:mm:ss)', 'timestamp')]

英文:

The code you have is almost correct.

If you have a dataframe with timestamps in string.

  1. +--------------------+
  2. | strDate|
  3. +--------------------+
  4. |26/Feb/2023:13:30:16|
  5. |26/Feb/2023:13:05:37|
  6. +--------------------+

You convert the column of 'strDate' to the given format.

  1. from pyspark.sql import functions as F
  2. res = df.select(F.to_timestamp(F.col('strDate'),"dd/MMM/yyyy:HH:mm:ss")).show()
  3. res.show()

Yields

  1. +-------------------------------------------+
  2. |to_timestamp(strDate, dd/MMM/yyyy:HH:mm:ss)|
  3. +-------------------------------------------+
  4. | 2023-02-26 13:30:16|
  5. | 2023-02-26 13:05:37|
  6. +-------------------------------------------+

We can verify the datatype with
res.dtypes

  1. res.dtypes
  2. Out[28]: [('to_timestamp(strDate, dd/MMM/yyyy:HH:mm:ss)', 'timestamp')]

huangapple
  • 本文由 发表于 2023年3月9日 16:59:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75682348.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定