如何在Pyspark中将字符串类型转换为时间戳?

huangapple go评论61阅读模式
英文:

How to convert string type to timestamp in pyspark?

问题

我在努力将基于字符串类型的日期转换为时间戳,如下所示。

我有以下的字符串类型,找到了一些代码可以将其转换为时间戳。我的pyspark代码如下。

但是尽管我尝试了很多次,结果都是空的。

任何帮助将不胜感激。

谢谢。

英文:

I am struggling to convert the string type based into timestamp as below.

+--------------------+
|              mydate|
+--------------------+
|26/Feb/2023:13:58:40|
|26/Feb/2023:13:30:33|
|26/Feb/2023:13:52:50|
|26/Feb/2023:13:47:09|
|26/Feb/2023:13:30:33|
|26/Feb/2023:13:14:28|
|26/Feb/2023:13:11:42|
|26/Feb/2023:13:34:03|
|26/Feb/2023:13:50:43|
|26/Feb/2023:13:10:47|
|26/Feb/2023:13:28:09|
|26/Feb/2023:13:30:16|
|26/Feb/2023:13:19:07|
|26/Feb/2023:13:30:24|
|26/Feb/2023:13:30:16|
|26/Feb/2023:13:05:37|
|26/Feb/2023:13:09:24|
|26/Feb/2023:13:24:18|
|26/Feb/2023:13:49:13|
|26/Feb/2023:13:56:40|
+--------------------+

I have the string type as below and I found the some codes that makes it converted to the time stamp. My pyspark code is as below.

wt.select('mydate').show()
wt.select(to_timestamp(lit('mydate'),"dd/MMM/yyyy:HH:mm:ss")).show()

But the results are empty even though I tried many times.

+----------------------------------------------+
|to_timestamp('mydate', 'dd/MMM/yyyy:HH:mm:ss')|
+----------------------------------------------+
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
|                                          null|
+----------------------------------------------+

Any help will be appreciated.
Thanks.

答案1

得分: 1

以下是代码部分的翻译:

The code you have is almost correct.
你的代码几乎是正确的。

If you have a dataframe with timestamps in string.
如果你有一个包含字符串格式的时间戳的数据框。

You convert the column of 'strDate' to the given format.
你将名为'strDate'的列转换为指定的格式。

Yields
产生的结果如下

We can verify the datatype with
我们可以使用以下方式验证数据类型

res.dtypes
数据类型如下:

[('to_timestamp(strDate, dd/MMM/yyyy:HH:mm:ss)', 'timestamp')]

英文:

The code you have is almost correct.

If you have a dataframe with timestamps in string.

+--------------------+
|             strDate|
+--------------------+
|26/Feb/2023:13:30:16|
|26/Feb/2023:13:05:37|
+--------------------+

You convert the column of 'strDate' to the given format.

from pyspark.sql import functions as F

res = df.select(F.to_timestamp(F.col('strDate'),"dd/MMM/yyyy:HH:mm:ss")).show()
res.show()

Yields

+-------------------------------------------+
|to_timestamp(strDate, dd/MMM/yyyy:HH:mm:ss)|
+-------------------------------------------+
|                        2023-02-26 13:30:16|
|                        2023-02-26 13:05:37|
+-------------------------------------------+

We can verify the datatype with
res.dtypes

res.dtypes

Out[28]: [('to_timestamp(strDate, dd/MMM/yyyy:HH:mm:ss)', 'timestamp')]

huangapple
  • 本文由 发表于 2023年3月9日 16:59:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75682348.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定