将Java的Timestamp数据类型转换为Scala的TimestampType。

huangapple go评论59阅读模式
英文:

Convert Java Timestamp Datatype to Scala TimestampType

问题

无法将Java Timestamp数据类型直接转换为Scala TimestampType,反之亦然。我尝试这样做:

val t = <Java Timestamp变量>.asInstanceOf[TimestampType]

但是出现了以下错误:

java.lang.ClassCastException: java.sql.Timestamp无法转换为org.apache.spark.sql.types.TimestampType
英文:

Is it possible to cast/convert a Java Timestamp Datatype to Scala TimestampType and vice-versa ?

I tried doing so this way:

val t = &lt;Java Timestamp variable&gt;.asInstanceOf[TimestampType]

But got this error:

java.lang.ClassCastException: java.sql.Timestamp cannot be cast to org.apache.spark.sql.types.TimestampType

答案1

得分: 2

在Spark中,org.apache.spark.sql.types.Timestampabstract class DataType的子类。所有这些子类都只是DataFrame列的元信息类型,它们不包含实际的值,而java.sql.Timestamp包含值。它们不是子类,这就是为什么你无法使用asInstanceOf进行类型转换的原因。

给你一个小例子来感受一下不同之处:

当你将数据存储到DataFrame中时,Spark会自动将其转换为spark.Timestamp

import java.sql.Timestamp    

val t = new Timestamp(System.currentTimeMillis())
val dfA: DataFrame = Seq(
  ("a", t),
  ("b", t),
  ("c", t)
).toDF("key", "time")

但是,如果你想读取数据并获取java.Timestamp,可以这样做:

dfA.collect().foreach{
  row =>
    println(row.getAs[Timestamp](1))
} 
// 将打印出:
// 2020-07-31 00:45:48.825
// 2020-07-31 00:45:48.825
// 2020-07-31 00:45:48.825

如果你查看DataFrame的模式:

dfA.printSchema()
dfA.schema.fields.foreach(println)

它将打印出:

root
 |-- key: string (nullable = true)
 |-- time: timestamp (nullable = true)

StructField(key,StringType,true)
StructField(time,TimestampType,true)

但是,如果你尝试使用asInstanceOfjava.Timestamp进行类型转换,你将会得到一个错误:

println(t.asInstanceOf[TimestampType]) 
/*
java.sql.Timestamp incompatible with 
    org.apache.spark.sql.types.TimestampType java.lang.ClassCastException: java.sql.Timestamp incompatible with org.apache.spark.sql.types.TimestampType
*/
英文:

In Spark org.apache.spark.sql.types.Timestamp - is subclass of abstract class DataType. All such subclasses is like just meta-information types of DataFrame columns. They doesn't contain some value but java.sql.Timestamp does it. And they are not subclasses, that is the reason you can't cast it using asInstanceOf.

Give you a small example to feel the difference:

when you just store data into DataFrame Spark will cast it by itself to spark.Timestamp

import java.sql.Timestamp    

val t = new Timestamp(System.currentTimeMillis())
val dfA: DataFrame = Seq(
  (&quot;a&quot;, t),
  (&quot;b&quot;, t),
  (&quot;c&quot;, t)
).toDFc&quot;key&quot;, &quot;time&quot;)

but if you want to read data and get java.Timestamp you can do it so:

dfA.collect().foreach{
  row =&gt;
    println(row.getAs[Timestamp](1))
} 
// will prints 
2020-07-31 00:45:48.825
2020-07-31 00:45:48.825
2020-07-31 00:45:48.825

if you will look at DataFrame schema:

dfA.printSchema()
dfA.schema.fields.foreach(println)

it will prints:

root
 |-- key: string (nullable = true)
 |-- time: timestamp (nullable = true)

StructField(key,StringType,true)
StructField(time,TimestampType,true)

but if you will try to cast java.Timestamp using asInctanceOf you will get fairly error:

println(t.asInstanceOf[TimestampType]) 
/*
java.sql.Timestamp incompatible with 
    org.apache.spark.sql.types.TimestampType java.lang.ClassCastException: java.sql.Timestamp incompatible with org.apache.spark.sql.types.TimestampType
/*

huangapple
  • 本文由 发表于 2020年7月31日 05:02:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/63181356.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定