使用PySpark创建时间戳列

huangapple go评论69阅读模式
英文:

creating timestamp column using pyspark

问题

I'd love to create a new timestamp column on a dataframe using a date column and a string column

Date Times (Sting) desired column
2020-11-03 15:34:02 2020-11-03 15:34:02

我想使用日期列和字符串列在数据框上创建一个新的时间戳列。

i'm trying something like that in the select statement but i'm having an error. Can anyone help?

我尝试在选择语句中尝试类似的操作,但出现了错误。有人可以帮忙吗?

F.to_timestamp(F.concat_ws('', F.col("Date"), F.col("Time"), 'yyyy-MM-dd HH:mm:ss')).alias("desired_column")

英文:

I'd love to create a new timestamp column on a dataframe using a date column and a string column

Date Times (Sting) desired column
2020-11-03 15:34:02 2020-11-03 15:34:02

i'm trying something like that in the select statement but i'm having an error. Can anyone help?

F.to_timestamp(F.concat_ws('', F.col("Date"), F.col("Time"), 'yyyy-MM-dd HH:mm:ss')).alias("desired_column")

答案1

得分: 2

你可以简单地使用 pyspark functions 来实现类似以下的操作:

import pyspark
from pyspark.sql import functions as sf

sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)

# 请注意,这是用于创建数据框的示例
df = sqlc.createDataFrame(['2020-11-03','15:34:02'], ['Date', 'Times (Sting)'])

print(df.show())

df = df.withColumn('desired column', sf.concat(sf.col('Date'), sf.lit(' '), sf.col('Times (Sting)')))

print(df.show())

输出:
使用PySpark创建时间戳列

英文:

You can simply do something like this by using pyspark functions:

import pyspark
from pyspark.sql import functions as sf

sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)

# note this i used to create the data frame
df = sqlc.createDataFrame([('2020-11-03','15:34:02')], ['Date', 'Times (Sting)'])

print(df.show())

df = df.withColumn('desired column',sf.concat(sf.col('Date'),sf.lit(' '), sf.col('Times (Sting)')))

print(df.show())

Output:
使用PySpark创建时间戳列

答案2

得分: 0

Use to_timestamp() 函数,因为它返回 timestamp 类型。

示例:

df = spark.createDataFrame(['2020-11-03', '15:34:02'], ['Date', 'Times (Sting)'])
df.withColumn('desired column', to_timestamp(concat(col('Date'), lit(' '), col('Times (Sting)')))).show(10, False)
df.withColumn('desired column', to_timestamp(concat(col('Date'), lit(' '), col('Times (Sting)')))).printSchema()
+----------+-------------+-------------------+
|Date      |Times (Sting)|desired column     |
+----------+-------------+-------------------+
|2020-11-03|15:34:02     |2020-11-03 15:34:02|
+----------+-------------+-------------------+

root
 |-- Date: string (nullable = true)
 |-- Times (Sting): string (nullable = true)
 |-- desired column: timestamp (nullable = true)
英文:

Use to_timestamp() function as it returns timestamp type.

Example:

df = spark.createDataFrame([('2020-11-03','15:34:02')], ['Date', 'Times (Sting)'])
df.withColumn('desired column',to_timestamp(concat(col('Date'),lit(' '), col('Times (Sting)')))).show(10,False)
df.withColumn('desired column',to_timestamp(concat(col('Date'),lit(' '), col('Times (Sting)')))).printSchema()

#+----------+-------------+-------------------+
#|Date      |Times (Sting)|desired column     |
#+----------+-------------+-------------------+
#|2020-11-03|15:34:02     |2020-11-03 15:34:02|
#+----------+-------------+-------------------+
#
#root
# |-- Date: string (nullable = true)
# |-- Times (Sting): string (nullable = true)
# |-- desired column: timestamp (nullable = true)

huangapple
  • 本文由 发表于 2023年4月19日 17:46:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053061.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定