使用PySpark创建时间戳列

huangapple go评论95阅读模式
英文:

creating timestamp column using pyspark

问题

I'd love to create a new timestamp column on a dataframe using a date column and a string column

Date Times (Sting) desired column
2020-11-03 15:34:02 2020-11-03 15:34:02

我想使用日期列和字符串列在数据框上创建一个新的时间戳列。

i'm trying something like that in the select statement but i'm having an error. Can anyone help?

我尝试在选择语句中尝试类似的操作,但出现了错误。有人可以帮忙吗?

F.to_timestamp(F.concat_ws('', F.col("Date"), F.col("Time"), 'yyyy-MM-dd HH:mm:ss')).alias("desired_column")

英文:

I'd love to create a new timestamp column on a dataframe using a date column and a string column

Date Times (Sting) desired column
2020-11-03 15:34:02 2020-11-03 15:34:02

i'm trying something like that in the select statement but i'm having an error. Can anyone help?

  1. F.to_timestamp(F.concat_ws('', F.col("Date"), F.col("Time"), 'yyyy-MM-dd HH:mm:ss')).alias("desired_column")

答案1

得分: 2

你可以简单地使用 pyspark functions 来实现类似以下的操作:

  1. import pyspark
  2. from pyspark.sql import functions as sf
  3. sc = pyspark.SparkContext()
  4. sqlc = pyspark.SQLContext(sc)
  5. # 请注意,这是用于创建数据框的示例
  6. df = sqlc.createDataFrame(['2020-11-03','15:34:02'], ['Date', 'Times (Sting)'])
  7. print(df.show())
  8. df = df.withColumn('desired column', sf.concat(sf.col('Date'), sf.lit(' '), sf.col('Times (Sting)')))
  9. print(df.show())

输出:
使用PySpark创建时间戳列

英文:

You can simply do something like this by using pyspark functions:

  1. import pyspark
  2. from pyspark.sql import functions as sf
  3. sc = pyspark.SparkContext()
  4. sqlc = pyspark.SQLContext(sc)
  5. # note this i used to create the data frame
  6. df = sqlc.createDataFrame([('2020-11-03','15:34:02')], ['Date', 'Times (Sting)'])
  7. print(df.show())
  8. df = df.withColumn('desired column',sf.concat(sf.col('Date'),sf.lit(' '), sf.col('Times (Sting)')))
  9. print(df.show())

Output:
使用PySpark创建时间戳列

答案2

得分: 0

Use to_timestamp() 函数,因为它返回 timestamp 类型。

示例:

  1. df = spark.createDataFrame(['2020-11-03', '15:34:02'], ['Date', 'Times (Sting)'])
  2. df.withColumn('desired column', to_timestamp(concat(col('Date'), lit(' '), col('Times (Sting)')))).show(10, False)
  3. df.withColumn('desired column', to_timestamp(concat(col('Date'), lit(' '), col('Times (Sting)')))).printSchema()
  1. +----------+-------------+-------------------+
  2. |Date |Times (Sting)|desired column |
  3. +----------+-------------+-------------------+
  4. |2020-11-03|15:34:02 |2020-11-03 15:34:02|
  5. +----------+-------------+-------------------+
  6. root
  7. |-- Date: string (nullable = true)
  8. |-- Times (Sting): string (nullable = true)
  9. |-- desired column: timestamp (nullable = true)
英文:

Use to_timestamp() function as it returns timestamp type.

Example:

  1. df = spark.createDataFrame([('2020-11-03','15:34:02')], ['Date', 'Times (Sting)'])
  2. df.withColumn('desired column',to_timestamp(concat(col('Date'),lit(' '), col('Times (Sting)')))).show(10,False)
  3. df.withColumn('desired column',to_timestamp(concat(col('Date'),lit(' '), col('Times (Sting)')))).printSchema()
  4. #+----------+-------------+-------------------+
  5. #|Date |Times (Sting)|desired column |
  6. #+----------+-------------+-------------------+
  7. #|2020-11-03|15:34:02 |2020-11-03 15:34:02|
  8. #+----------+-------------+-------------------+
  9. #
  10. #root
  11. # |-- Date: string (nullable = true)
  12. # |-- Times (Sting): string (nullable = true)
  13. # |-- desired column: timestamp (nullable = true)

huangapple
  • 本文由 发表于 2023年4月19日 17:46:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053061.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定