英文:
Spark DataFrame casting string to date results in null values
问题
I get null
when I attempt to cast string date in Spark DataFrame to date
type.
# 创建数据列表
data = [(1, "20230517"), (2, "20230518"), (3, "20230519"), (4, "null")]
# 从数据列表创建DataFrame
df = spark.createDataFrame(data, ("id", "date"))
df.show()
df.printSchema()
root
|-- id: long (nullable = true)
|-- date: string (nullable = true)
# 将SaleDate列转换为datetime格式
df1 = df.withColumn("date", df.date.cast('date'))
df1.select('date').show()
+--------+
| date|
+--------+
| null|
| null|
| null|
| null|
英文:
I get null
when I attempt to cast string date in Spark DataFrame to date
type.
# Create a list of data
data = [(1, "20230517"), (2, "20230518"), (3, "20230519"), (4, "null")]
# Create a DataFrame from the list of data
df = spark.createDataFrame(data, ("id", "date"))
df.show()
df.printSchema()
root
|-- id: long (nullable = true)
|-- date: string (nullable = true)
# Convert the SaleDate column to datetime format
df1 = df.withColumn("date", df.date.cast('date'))
df1.select('date').show()
+--------+
|date |
+--------+
| null|
| null|
| null|
| null|
答案1
得分: 1
For this operation you should use F.to_date()
and specify the format which you want to parse (yyyyMMdd
in your case):
F.to_date('date', format='yyyyMMdd')
Full code I used:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
spark = SparkSession.builder.appName('spark_session').getOrCreate()
# Create a list of data
data = [(1, "20230517"), (2, "20230518"), (3, "20230519"), (4, "null")]
# Create a DataFrame from the list of data
df = spark.createDataFrame(data, ("id", "date"))
# Convert the SaleDate column to datetime format
df1 = df.withColumn("date", F.to_date('date', format='yyyyMMdd'))
df1.select('date').show()
+----------+
| date|
+----------+
|2023-05-17|
|2023-05-18|
|2023-05-19|
| null|
+----------+
英文:
For this operation you should use F.to_date()
and specify the format which you want to parse (yyyyMMdd
in your case):
F.to_date('date', format='yyyyMMdd')
Full code I used:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
spark = SparkSession.builder.appName('spark_session').getOrCreate()
# Create a list of data
data = [(1, "20230517"), (2, "20230518"), (3, "20230519"), (4, "null")]
# Create a DataFrame from the list of data
df = spark.createDataFrame(data, ("id", "date"))
# Convert the SaleDate column to datetime format
df1 = df.withColumn("date", F.to_date('date', format='yyyyMMdd'))
df1.select('date').show()
+----------+
| date|
+----------+
|2023-05-17|
|2023-05-18|
|2023-05-19|
| null|
+----------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论