问题

我正在使用Databricks加载一些数据。
数据中包含一个日期时间字段。
日期时间字段的格式是：2022-07-06 16:43:18.696 +0100，
我想将其转换成这个格式：2022-07-06T16:43:18.8000000+00:0。
最终，我将数据转换成JSON以供输入API（这部分已完成），所以该字段将被转换为字符串。
我只是在尝试使用Pyspark将日期时间转换成所需的格式时遇到困难。API调用代码已完成，但出现错误，因为它期望日期时间字段处于所需的格式中。

英文:

I'm using databricks to load some data. <br>
Within the data is a datetime. <br> The datetime field is in this format: 2022-07-06 16:43:18.696 +0100 <br>and I want it to be in this format: 2022-07-06T16:43:18.8000000+00:0. <br> I'm ultimately converting the data into JSON to be fed into an API (which I've done), so the field is going to be converted into a string. I'm just having difficulty getting the datetime into the desired format using pyspark. The API call code is completed but errors out because it is expecting the datetime field to be in the desired format.

答案1

得分: 1

You would need to use the to_utc_timestamp function.

from pyspark.sql.functions import to_utc_timestamp, date_format

df = [
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0100", "Indictor": 1},
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0200", "Indictor": 0},
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0300", "Indictor": 1},
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0400", "Indictor": 1},
    {"Category": "A", "date": "2022-07-06 16:43:18.696 +0500", "Indictor": 1},
]

df = spark.createDataFrame(df)

# Convert the datetime field to the desired format
converted_data = df.withColumn(
    "date",
    date_format(
        to_utc_timestamp(df["date"], "GMT"), "yyyy-MM-dd'T'HH:mm:ss.SSSSSSS'+00:0'"
    ),
)

# Show the converted data
converted_data.show(truncate=False)

Result

+--------+--------+--------------------------------+
|Category|Indictor|date                            |
+--------+--------+--------------------------------+
|A       |1       |2022-07-06T15:43:18.6960000+00:0|
|A       |0       |2022-07-06T14:43:18.6960000+00:0|
|A       |1       |2022-07-06T13:43:18.6960000+00:0|
|A       |1       |2022-07-06T12:43:18.6960000+00:0|
|A       |1       |2022-07-06T11:43:18.6960000+00:0|
+--------+--------+--------------------------------+

Also, note that the +0100 in the timestamp implies the timezone offset, which is 1 hour ahead of UTC time. Your API requires UTC timezone in the desired format.

英文:

You would need to use the to_utc_timestamp function.

from pyspark.sql.functions import to_utc_timestamp, date_format

df = [
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0100&quot;, &quot;Indictor&quot;: 1},
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0200&quot;, &quot;Indictor&quot;: 0},
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0300&quot;, &quot;Indictor&quot;: 1},
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0400&quot;, &quot;Indictor&quot;: 1},
    {&quot;Category&quot;: &quot;A&quot;, &quot;date&quot;: &quot;2022-07-06 16:43:18.696 +0500&quot;, &quot;Indictor&quot;: 1},
]

df = spark.createDataFrame(df)

# Convert the datetime field to the desired format
converted_data = df.withColumn(
    &quot;date&quot;,
    date_format(
        to_utc_timestamp(df[&quot;date&quot;], &quot;GMT&quot;), &quot;yyyy-MM-dd&#39;T&#39;HH:mm:ss.SSSSSSS&#39;+00:0&#39;&quot;
    ),
)

# Show the converted data
converted_data.show(truncate=False)

Result

+--------+--------+--------------------------------+
|Category|Indictor|date                            |
+--------+--------+--------------------------------+
|A       |1       |2022-07-06T15:43:18.6960000+00:0|
|A       |0       |2022-07-06T14:43:18.6960000+00:0|
|A       |1       |2022-07-06T13:43:18.6960000+00:0|
|A       |1       |2022-07-06T12:43:18.6960000+00:0|
|A       |1       |2022-07-06T11:43:18.6960000+00:0|
+--------+--------+--------------------------------+

Also a note, the +0100 in the timestamp implies the timezone offset. So its 1 hour ahead of the UTC time. Which from your required format your API needs a UTC timezone.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将pyspark的日期时间格式转换为不同的日期时间格式。

问题

答案1

在Pandas中计算行之间的差异

为什么toPandas不好？

Spark writes parquet with partitionBy throws FileAlreadyExistsException in its own temporary working space

如何在Golang中从SQL中查询以2023-06-08 19:54:41 +0000格式存储的sql.NullTime值？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论