英文:
converting pyspark datetime format into different datetime format
问题
我正在使用Databricks加载一些数据。
数据中包含一个日期时间字段。
日期时间字段的格式是:2022-07-06 16:43:18.696 +0100
,
我想将其转换成这个格式:2022-07-06T16:43:18.8000000+00:0
。
最终,我将数据转换成JSON以供输入API(这部分已完成),所以该字段将被转换为字符串。
我只是在尝试使用Pyspark将日期时间转换成所需的格式时遇到困难。API调用代码已完成,但出现错误,因为它期望日期时间字段处于所需的格式中。
英文:
I'm using databricks to load some data. <br>
Within the data is a datetime. <br> The datetime field is in this format: 2022-07-06 16:43:18.696 +0100
<br>and I want it to be in this format: 2022-07-06T16:43:18.8000000+00:0
. <br> I'm ultimately converting the data into JSON to be fed into an API (which I've done), so the field is going to be converted into a string. I'm just having difficulty getting the datetime into the desired format using pyspark. The API call code is completed but errors out because it is expecting the datetime field to be in the desired format.
答案1
得分: 1
You would need to use the to_utc_timestamp
function.
from pyspark.sql.functions import to_utc_timestamp, date_format
df = [
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0100", "Indictor": 1},
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0200", "Indictor": 0},
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0300", "Indictor": 1},
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0400", "Indictor": 1},
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0500", "Indictor": 1},
]
df = spark.createDataFrame(df)
# Convert the datetime field to the desired format
converted_data = df.withColumn(
"date",
date_format(
to_utc_timestamp(df["date"], "GMT"), "yyyy-MM-dd'T'HH:mm:ss.SSSSSSS'+00:0'"
),
)
# Show the converted data
converted_data.show(truncate=False)
Result
+--------+--------+--------------------------------+
|Category|Indictor|date |
+--------+--------+--------------------------------+
|A |1 |2022-07-06T15:43:18.6960000+00:0|
|A |0 |2022-07-06T14:43:18.6960000+00:0|
|A |1 |2022-07-06T13:43:18.6960000+00:0|
|A |1 |2022-07-06T12:43:18.6960000+00:0|
|A |1 |2022-07-06T11:43:18.6960000+00:0|
+--------+--------+--------------------------------+
Also, note that the +0100
in the timestamp implies the timezone offset, which is 1 hour ahead of UTC time. Your API requires UTC timezone in the desired format.
英文:
You would need to use the to_utc_timestamp
function.
from pyspark.sql.functions import to_utc_timestamp, date_format
df = [
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0100", "Indictor": 1},
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0200", "Indictor": 0},
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0300", "Indictor": 1},
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0400", "Indictor": 1},
{"Category": "A", "date": "2022-07-06 16:43:18.696 +0500", "Indictor": 1},
]
df = spark.createDataFrame(df)
# Convert the datetime field to the desired format
converted_data = df.withColumn(
"date",
date_format(
to_utc_timestamp(df["date"], "GMT"), "yyyy-MM-dd'T'HH:mm:ss.SSSSSSS'+00:0'"
),
)
# Show the converted data
converted_data.show(truncate=False)
Result
+--------+--------+--------------------------------+
|Category|Indictor|date |
+--------+--------+--------------------------------+
|A |1 |2022-07-06T15:43:18.6960000+00:0|
|A |0 |2022-07-06T14:43:18.6960000+00:0|
|A |1 |2022-07-06T13:43:18.6960000+00:0|
|A |1 |2022-07-06T12:43:18.6960000+00:0|
|A |1 |2022-07-06T11:43:18.6960000+00:0|
+--------+--------+--------------------------------+
Also a note, the +0100
in the timestamp implies the timezone offset. So its 1 hour ahead of the UTC time. Which from your required format your API needs a UTC timezone.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论