将pyspark的日期时间格式转换为不同的日期时间格式。

huangapple go评论61阅读模式
英文:

converting pyspark datetime format into different datetime format

问题

我正在使用Databricks加载一些数据。
数据中包含一个日期时间字段。
日期时间字段的格式是:2022-07-06 16:43:18.696 +0100
我想将其转换成这个格式:2022-07-06T16:43:18.8000000+00:0
最终,我将数据转换成JSON以供输入API(这部分已完成),所以该字段将被转换为字符串。
我只是在尝试使用Pyspark将日期时间转换成所需的格式时遇到困难。API调用代码已完成,但出现错误,因为它期望日期时间字段处于所需的格式中。

英文:

I'm using databricks to load some data. <br>
Within the data is a datetime. <br> The datetime field is in this format: 2022-07-06 16:43:18.696 +0100 <br>and I want it to be in this format: 2022-07-06T16:43:18.8000000+00:0. <br> I'm ultimately converting the data into JSON to be fed into an API (which I've done), so the field is going to be converted into a string. I'm just having difficulty getting the datetime into the desired format using pyspark. The API call code is completed but errors out because it is expecting the datetime field to be in the desired format.

答案1

得分: 1

You would need to use the to_utc_timestamp function.

from pyspark.sql.functions import to_utc_timestamp, date_format

df = [
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0100", "Indictor": 1},
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0200", "Indictor": 0},
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0300", "Indictor": 1},
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0400", "Indictor": 1},
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0500", "Indictor": 1},
]

df = spark.createDataFrame(df)

# Convert the datetime field to the desired format
converted_data = df.withColumn(
    "date",
    date_format(
        to_utc_timestamp(df["date"], "GMT"), "yyyy-MM-dd'T'HH:mm:ss.SSSSSSS'+00:0'"
    ),
)

# Show the converted data
converted_data.show(truncate=False)

Result

+--------+--------+--------------------------------+
|Category|Indictor|date                            |
+--------+--------+--------------------------------+
|A       |1       |2022-07-06T15:43:18.6960000+00:0|
|A       |0       |2022-07-06T14:43:18.6960000+00:0|
|A       |1       |2022-07-06T13:43:18.6960000+00:0|
|A       |1       |2022-07-06T12:43:18.6960000+00:0|
|A       |1       |2022-07-06T11:43:18.6960000+00:0|
+--------+--------+--------------------------------+

Also, note that the +0100 in the timestamp implies the timezone offset, which is 1 hour ahead of UTC time. Your API requires UTC timezone in the desired format.

英文:

You would need to use the to_utc_timestamp function.

from pyspark.sql.functions import to_utc_timestamp, date_format

df = [
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0100&quot;, &quot;Indictor&quot;: 1},
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0200&quot;, &quot;Indictor&quot;: 0},
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0300&quot;, &quot;Indictor&quot;: 1},
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0400&quot;, &quot;Indictor&quot;: 1},
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0500&quot;, &quot;Indictor&quot;: 1},
]

df = spark.createDataFrame(df)

# Convert the datetime field to the desired format
converted_data = df.withColumn(
    &quot;date&quot;,
    date_format(
        to_utc_timestamp(df[&quot;date&quot;], &quot;GMT&quot;), &quot;yyyy-MM-dd&#39;T&#39;HH:mm:ss.SSSSSSS&#39;+00:0&#39;&quot;
    ),
)

# Show the converted data
converted_data.show(truncate=False)

Result

+--------+--------+--------------------------------+
|Category|Indictor|date                            |
+--------+--------+--------------------------------+
|A       |1       |2022-07-06T15:43:18.6960000+00:0|
|A       |0       |2022-07-06T14:43:18.6960000+00:0|
|A       |1       |2022-07-06T13:43:18.6960000+00:0|
|A       |1       |2022-07-06T12:43:18.6960000+00:0|
|A       |1       |2022-07-06T11:43:18.6960000+00:0|
+--------+--------+--------------------------------+

Also a note, the +0100 in the timestamp implies the timezone offset. So its 1 hour ahead of the UTC time. Which from your required format your API needs a UTC timezone.

huangapple
  • 本文由 发表于 2023年5月11日 18:34:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226683.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定