英文:
Convert Java Timestamp Datatype to Scala TimestampType
问题
无法将Java Timestamp数据类型直接转换为Scala TimestampType,反之亦然。我尝试这样做:
val t = <Java Timestamp变量>.asInstanceOf[TimestampType]
但是出现了以下错误:
java.lang.ClassCastException: java.sql.Timestamp无法转换为org.apache.spark.sql.types.TimestampType
英文:
Is it possible to cast/convert a Java Timestamp Datatype to Scala TimestampType and vice-versa ?
I tried doing so this way:
val t = <Java Timestamp variable>.asInstanceOf[TimestampType]
But got this error:
java.lang.ClassCastException: java.sql.Timestamp cannot be cast to org.apache.spark.sql.types.TimestampType
答案1
得分: 2
在Spark中,org.apache.spark.sql.types.Timestamp是abstract class DataType的子类。所有这些子类都只是DataFrame列的元信息类型,它们不包含实际的值,而java.sql.Timestamp包含值。它们不是子类,这就是为什么你无法使用asInstanceOf进行类型转换的原因。
给你一个小例子来感受一下不同之处:
当你将数据存储到DataFrame中时,Spark会自动将其转换为spark.Timestamp:
import java.sql.Timestamp
val t = new Timestamp(System.currentTimeMillis())
val dfA: DataFrame = Seq(
("a", t),
("b", t),
("c", t)
).toDF("key", "time")
但是,如果你想读取数据并获取java.Timestamp,可以这样做:
dfA.collect().foreach{
row =>
println(row.getAs[Timestamp](1))
}
// 将打印出:
// 2020-07-31 00:45:48.825
// 2020-07-31 00:45:48.825
// 2020-07-31 00:45:48.825
如果你查看DataFrame的模式:
dfA.printSchema()
dfA.schema.fields.foreach(println)
它将打印出:
root
|-- key: string (nullable = true)
|-- time: timestamp (nullable = true)
StructField(key,StringType,true)
StructField(time,TimestampType,true)
但是,如果你尝试使用asInstanceOf将java.Timestamp进行类型转换,你将会得到一个错误:
println(t.asInstanceOf[TimestampType])
/*
java.sql.Timestamp incompatible with
org.apache.spark.sql.types.TimestampType java.lang.ClassCastException: java.sql.Timestamp incompatible with org.apache.spark.sql.types.TimestampType
*/
英文:
In Spark org.apache.spark.sql.types.Timestamp - is subclass of abstract class DataType. All such subclasses is like just meta-information types of DataFrame columns. They doesn't contain some value but java.sql.Timestamp does it. And they are not subclasses, that is the reason you can't cast it using asInstanceOf.
Give you a small example to feel the difference:
when you just store data into DataFrame Spark will cast it by itself to spark.Timestamp
import java.sql.Timestamp
val t = new Timestamp(System.currentTimeMillis())
val dfA: DataFrame = Seq(
("a", t),
("b", t),
("c", t)
).toDFc"key", "time")
but if you want to read data and get java.Timestamp you can do it so:
dfA.collect().foreach{
row =>
println(row.getAs[Timestamp](1))
}
// will prints
2020-07-31 00:45:48.825
2020-07-31 00:45:48.825
2020-07-31 00:45:48.825
if you will look at DataFrame schema:
dfA.printSchema()
dfA.schema.fields.foreach(println)
it will prints:
root
|-- key: string (nullable = true)
|-- time: timestamp (nullable = true)
StructField(key,StringType,true)
StructField(time,TimestampType,true)
but if you will try to cast java.Timestamp using asInctanceOf you will get fairly error:
println(t.asInstanceOf[TimestampType])
/*
java.sql.Timestamp incompatible with
org.apache.spark.sql.types.TimestampType java.lang.ClassCastException: java.sql.Timestamp incompatible with org.apache.spark.sql.types.TimestampType
/*
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论