英文:
creating timestamp column using pyspark
问题
I'd love to create a new timestamp column on a dataframe using a date column and a string column
Date | Times (Sting) | desired column |
---|---|---|
2020-11-03 | 15:34:02 | 2020-11-03 15:34:02 |
我想使用日期列和字符串列在数据框上创建一个新的时间戳列。
i'm trying something like that in the select statement but i'm having an error. Can anyone help?
我尝试在选择语句中尝试类似的操作,但出现了错误。有人可以帮忙吗?
F.to_timestamp(F.concat_ws('', F.col("Date"), F.col("Time"), 'yyyy-MM-dd HH:mm:ss')).alias("desired_column")
英文:
I'd love to create a new timestamp column on a dataframe using a date column and a string column
Date | Times (Sting) | desired column |
---|---|---|
2020-11-03 | 15:34:02 | 2020-11-03 15:34:02 |
i'm trying something like that in the select statement but i'm having an error. Can anyone help?
F.to_timestamp(F.concat_ws('', F.col("Date"), F.col("Time"), 'yyyy-MM-dd HH:mm:ss')).alias("desired_column")
答案1
得分: 2
你可以简单地使用 pyspark
functions 来实现类似以下的操作:
import pyspark
from pyspark.sql import functions as sf
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
# 请注意,这是用于创建数据框的示例
df = sqlc.createDataFrame(['2020-11-03','15:34:02'], ['Date', 'Times (Sting)'])
print(df.show())
df = df.withColumn('desired column', sf.concat(sf.col('Date'), sf.lit(' '), sf.col('Times (Sting)')))
print(df.show())
英文:
You can simply do something like this by using pyspark
functions:
import pyspark
from pyspark.sql import functions as sf
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
# note this i used to create the data frame
df = sqlc.createDataFrame([('2020-11-03','15:34:02')], ['Date', 'Times (Sting)'])
print(df.show())
df = df.withColumn('desired column',sf.concat(sf.col('Date'),sf.lit(' '), sf.col('Times (Sting)')))
print(df.show())
答案2
得分: 0
Use to_timestamp()
函数,因为它返回 timestamp
类型。
示例:
df = spark.createDataFrame(['2020-11-03', '15:34:02'], ['Date', 'Times (Sting)'])
df.withColumn('desired column', to_timestamp(concat(col('Date'), lit(' '), col('Times (Sting)')))).show(10, False)
df.withColumn('desired column', to_timestamp(concat(col('Date'), lit(' '), col('Times (Sting)')))).printSchema()
+----------+-------------+-------------------+
|Date |Times (Sting)|desired column |
+----------+-------------+-------------------+
|2020-11-03|15:34:02 |2020-11-03 15:34:02|
+----------+-------------+-------------------+
root
|-- Date: string (nullable = true)
|-- Times (Sting): string (nullable = true)
|-- desired column: timestamp (nullable = true)
英文:
Use to_timestamp()
function as it returns timestamp
type.
Example:
df = spark.createDataFrame([('2020-11-03','15:34:02')], ['Date', 'Times (Sting)'])
df.withColumn('desired column',to_timestamp(concat(col('Date'),lit(' '), col('Times (Sting)')))).show(10,False)
df.withColumn('desired column',to_timestamp(concat(col('Date'),lit(' '), col('Times (Sting)')))).printSchema()
#+----------+-------------+-------------------+
#|Date |Times (Sting)|desired column |
#+----------+-------------+-------------------+
#|2020-11-03|15:34:02 |2020-11-03 15:34:02|
#+----------+-------------+-------------------+
#
#root
# |-- Date: string (nullable = true)
# |-- Times (Sting): string (nullable = true)
# |-- desired column: timestamp (nullable = true)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论