2023年6月26日 01:07:50go评论96阅读模式

英文:

Pyspark Table Name with Timestamp

问题

我在Databricks中使用Pyspark编码。我在重命名现有表格并添加时间戳方面遇到问题。该表位于mydatabase.tableOne，我想将其保存为另一个表mydatabase.tableOne_20230625。这将允许新的处理运行，创建mydatabase.tableOne的新版本。

df_my_loc = 'mydatabase.tableOne'
timestamp_suffix = date_format(current_timestamp(), 'yyyyMMdd')
df_my_loc_new = df_my_loc + '_' + timestamp_suffix
df_arch = spark.table(df_my_loc)
df_arch.write.format("delta").mode("ignore").saveAsTable(df_my_loc_new)

我收到一个错误，说列不可迭代。似乎将时间戳添加为后缀导致了这个错误。在运行新进程之前，目标是使用时间戳存档上一个表格，并运行新输出的笔记本，其中表格名称不会更改。

英文:

I'm working in databricks coding in Pyspark. I'm encountering issues with renaming an existing table and adding a time stamp. The table is located mydatabase.tableOne, and I want to save it as another table called mydatabase.tableOne_20230625. This will allow the new process to run, creating a new version of mydatabase.tableOne

df_my_loc = &#39;mydatabase.tableOne&#39;
timestamp_suffix = date_format(current_timestamp(), &#39;yyyyMMdd&#39;)
df_my_loc_new = df_my_loc + &#39;_&#39; + timestamp_suffix
df_arch = spark.table(df_my_loc)
df_arch.write.format(&quot;delta&quot;).mode(&quot;ignore&quot;).saveAsTable(df_my_loc_new)

I get an error that says the column is not iterable. It seems like adding the timestamp as a suffix is giving me the error. Before running a new process, the goal is to archive the previous table with a timestamp and run a notebook for the new output where the table names do not change.

答案1

得分: 0

date_format 和 current_timestamp 是 Spark SQL 函数。如果在 Python 中使用，就像在你的示例中一样，它们的返回类型是 Column。这个表达式：

df_my_loc_new = df_my_loc + '_ ' + timestamp_suffix

是一个 Python 表达式 - 简单字符串的连接。你不能连接类型为字符串的 String 和 Column。你可以选择：

使用 + 运算符连接 Python 字符串
或者使用 concat 函数连接 SQL 字符串

由于构建表名是纯粹的 Python，你只需要执行以下操作：

from datetime import datetime
timestamp_suffix = datetime.now().strftime('%Y%m')

这将生成一个简单字符串，格式为 YYYYMM 的 timestamp_suffix，然后你的代码将正常工作。

英文:

date_format and current_timestamp are Spark SQL functions. If used in Python, like in your example, their return type is Column. This:

This:

df_my_loc_new = df_my_loc + &#39;_&#39; + timestamp_suffix

is a Python expression - concatenation of simple strings. You cannot concatenate String and Column of type String. You can either:

concatenate Python string using + operator
or concatenate SQL strings using concat function

Since building a table name is pure Python you just need to do:

from datetime import datetime
timestamp_suffix = datetime.now().strftime(&#39;%Y%m&#39;)

This will yield timestamp_suffix as a simple string in YYYYMM format and your code will work.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pyspark表名与时间戳

问题

答案1

使用pyspark读取非标准JSON格式

空列在Databricks的S3选择中未列出

在AWS Glue中写入BigQuery时出现空指针异常。

如何删除在特定子字符串之后有文本的行？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。