英文:
Pyspark Table Name with Timestamp
问题
我在Databricks中使用Pyspark编码。我在重命名现有表格并添加时间戳方面遇到问题。该表位于mydatabase.tableOne,我想将其保存为另一个表mydatabase.tableOne_20230625。这将允许新的处理运行,创建mydatabase.tableOne的新版本。
df_my_loc = 'mydatabase.tableOne'
timestamp_suffix = date_format(current_timestamp(), 'yyyyMMdd')
df_my_loc_new = df_my_loc + '_' + timestamp_suffix
df_arch = spark.table(df_my_loc)
df_arch.write.format("delta").mode("ignore").saveAsTable(df_my_loc_new)
我收到一个错误,说列不可迭代。似乎将时间戳添加为后缀导致了这个错误。在运行新进程之前,目标是使用时间戳存档上一个表格,并运行新输出的笔记本,其中表格名称不会更改。
英文:
I'm working in databricks coding in Pyspark. I'm encountering issues with renaming an existing table and adding a time stamp. The table is located mydatabase.tableOne, and I want to save it as another table called mydatabase.tableOne_20230625. This will allow the new process to run, creating a new version of mydatabase.tableOne
df_my_loc = 'mydatabase.tableOne'
timestamp_suffix = date_format(current_timestamp(), 'yyyyMMdd')
df_my_loc_new = df_my_loc + '_' + timestamp_suffix
df_arch = spark.table(df_my_loc)
df_arch.write.format("delta").mode("ignore").saveAsTable(df_my_loc_new)
I get an error that says the column is not iterable. It seems like adding the timestamp as a suffix is giving me the error. Before running a new process, the goal is to archive the previous table with a timestamp and run a notebook for the new output where the table names do not change.
答案1
得分: 0
date_format
和 current_timestamp
是 Spark SQL 函数。如果在 Python 中使用,就像在你的示例中一样,它们的返回类型是 Column
。这个表达式:
df_my_loc_new = df_my_loc + '_ ' + timestamp_suffix
是一个 Python 表达式 - 简单字符串的连接。你不能连接类型为字符串的 String
和 Column
。你可以选择:
- 使用
+
运算符连接 Python 字符串 - 或者使用
concat
函数连接 SQL 字符串
由于构建表名是纯粹的 Python,你只需要执行以下操作:
from datetime import datetime
timestamp_suffix = datetime.now().strftime('%Y%m')
这将生成一个简单字符串,格式为 YYYYMM 的 timestamp_suffix
,然后你的代码将正常工作。
英文:
date_format
and current_timestamp
are Spark SQL functions. If used in Python, like in your example, their return type is Column
. This:
This:
df_my_loc_new = df_my_loc + '_' + timestamp_suffix
is a Python expression - concatenation of simple strings. You cannot concatenate String
and Column
of type String. You can either:
- concatenate Python string using
+
operator - or concatenate SQL strings using
concat
function
Since building a table name is pure Python you just need to do:
from datetime import datetime
timestamp_suffix = datetime.now().strftime('%Y%m')
This will yield timestamp_suffix
as a simple string in YYYYMM format and your code will work.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论