英文:
Convert Java Timestamp Datatype to Scala TimestampType
问题
无法将Java Timestamp数据类型直接转换为Scala TimestampType,反之亦然。我尝试这样做:
val t = <Java Timestamp变量>.asInstanceOf[TimestampType]
但是出现了以下错误:
java.lang.ClassCastException: java.sql.Timestamp无法转换为org.apache.spark.sql.types.TimestampType
英文:
Is it possible to cast/convert a Java Timestamp Datatype to Scala TimestampType and vice-versa ?
I tried doing so this way:
val t = <Java Timestamp variable>.asInstanceOf[TimestampType]
But got this error:
java.lang.ClassCastException: java.sql.Timestamp cannot be cast to org.apache.spark.sql.types.TimestampType
答案1
得分: 2
在Spark中,org.apache.spark.sql.types.Timestamp
是abstract class DataType
的子类。所有这些子类都只是DataFrame
列的元信息类型,它们不包含实际的值,而java.sql.Timestamp
包含值。它们不是子类,这就是为什么你无法使用asInstanceOf
进行类型转换的原因。
给你一个小例子来感受一下不同之处:
当你将数据存储到DataFrame
中时,Spark会自动将其转换为spark.Timestamp
:
import java.sql.Timestamp
val t = new Timestamp(System.currentTimeMillis())
val dfA: DataFrame = Seq(
("a", t),
("b", t),
("c", t)
).toDF("key", "time")
但是,如果你想读取数据并获取java.Timestamp
,可以这样做:
dfA.collect().foreach{
row =>
println(row.getAs[Timestamp](1))
}
// 将打印出:
// 2020-07-31 00:45:48.825
// 2020-07-31 00:45:48.825
// 2020-07-31 00:45:48.825
如果你查看DataFrame
的模式:
dfA.printSchema()
dfA.schema.fields.foreach(println)
它将打印出:
root
|-- key: string (nullable = true)
|-- time: timestamp (nullable = true)
StructField(key,StringType,true)
StructField(time,TimestampType,true)
但是,如果你尝试使用asInstanceOf
将java.Timestamp
进行类型转换,你将会得到一个错误:
println(t.asInstanceOf[TimestampType])
/*
java.sql.Timestamp incompatible with
org.apache.spark.sql.types.TimestampType java.lang.ClassCastException: java.sql.Timestamp incompatible with org.apache.spark.sql.types.TimestampType
*/
英文:
In Spark org.apache.spark.sql.types.Timestamp
- is subclass of abstract class DataType
. All such subclasses is like just meta-information types of DataFrame
columns. They doesn't contain some value but java.sql.Timestamp
does it. And they are not subclasses, that is the reason you can't cast it using asInstanceOf
.
Give you a small example to feel the difference:
when you just store data into DataFrame
Spark will cast it by itself to spark.Timestamp
import java.sql.Timestamp
val t = new Timestamp(System.currentTimeMillis())
val dfA: DataFrame = Seq(
("a", t),
("b", t),
("c", t)
).toDFc"key", "time")
but if you want to read data and get java.Timestamp
you can do it so:
dfA.collect().foreach{
row =>
println(row.getAs[Timestamp](1))
}
// will prints
2020-07-31 00:45:48.825
2020-07-31 00:45:48.825
2020-07-31 00:45:48.825
if you will look at DataFrame
schema:
dfA.printSchema()
dfA.schema.fields.foreach(println)
it will prints:
root
|-- key: string (nullable = true)
|-- time: timestamp (nullable = true)
StructField(key,StringType,true)
StructField(time,TimestampType,true)
but if you will try to cast java.Timestamp using asInctanceOf
you will get fairly error:
println(t.asInstanceOf[TimestampType])
/*
java.sql.Timestamp incompatible with
org.apache.spark.sql.types.TimestampType java.lang.ClassCastException: java.sql.Timestamp incompatible with org.apache.spark.sql.types.TimestampType
/*
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论