spark.sqlContext.implicits._ 在 Scala 中是如何工作的?

huangapple go评论57阅读模式
英文:

How import spark.sqlContext.implicits._ works in scala?

问题

以下是翻译的内容:

我是Scala的新手

这是我试图理解的内容

这段代码片段给我提供了RDD[Int],而没有提供使用toDF的选项

var input = spark.sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9))

但是当我导入import spark.sqlContext.implicits._时,它给我提供了使用toDF的选项

import spark.sqlContext.implicits._
var input = spark.sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9)).toDF

所以我查看了源代码,implicits存在于SQLContext类中作为object。我无法理解,为什么在导入后RDD实例能够调用toDF

有人可以帮助我理解吗?

更新

在SQLContext类中找到了下面的代码片段

object implicits extends SQLImplicits with Serializable {
  protected override def _sqlContext: SQLContext = self
}
英文:

I'm new in Scala

Here's what I'm trying to understand

This code snippet gives me RDD[Int], not give option to use toDF

var input = spark.sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9))

But when I import import spark.sqlContext.implicits._, it gives me an option to use toDF

import spark.sqlContext.implicits._
var input = spark.sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9)).toDF

So I looked into the source code, implicits is present in SQLContext class as object. I'm not able to understand, how come RDD instance is able to call toDF after import ?

Can anyone help me to understand ?

update

found below code snippet in SQLContext class

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

  object implicits extends SQLImplicits with Serializable {
    protected override def _sqlContext: SQLContext = self
  }

答案1

得分: 3

toDF 是一个扩展方法。通过导入,您将必要的隐式参数引入了作用域。

例如,Int 类型没有 foo 方法:

1.foo() // 不能编译通过

但是如果您定义了一个扩展方法并导入隐式参数:

object implicits {
  implicit class IntOps(i: Int) {
    def foo() = println("foo")
  }
}

import implicits._
1.foo() // 可以编译通过

编译器将 1.foo() 转换为 new IntOps(1).foo()

类似地,

object implicits extends SQLImplicits ...

abstract class SQLImplicits ... {
  ...

  implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] = {
    DatasetHolder(_sqlContext.createDataset(rdd))
  }

  implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {
    DatasetHolder(_sqlContext.createDataset(s))
  }
}

case class DatasetHolder[T] private[sql](private val ds: Dataset[T]) {

  def toDS(): Dataset[T] = ds

  def toDF(): DataFrame = ds.toDF()

  def toDF(colNames: String*): DataFrame = ds.toDF(colNames : _*)
}

import spark.sqlContext.implicits._spark.sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9)).toDF 转换为 rddToDatasetHolder(spark.sparkContext.parallelize...).toDFDatasetHolder(_sqlContext.createDataset(spark.sparkContext.parallelize...)).toDF

您可以阅读关于 Scala 中的隐式参数和扩展方法的信息:

关于 spark.implicits._

英文:

toDF is an extension method. With the import you bring necessary implicits to the scope.

For example Int doesn't have method foo

1.foo() // doesn't compile

But if you define an extension method and import implicit

object implicits {
  implicit class IntOps(i: Int) {
    def foo() = println("foo")
  }
}

import implicits._
1.foo() // compiles

The compiler transforms 1.foo() into new IntOps(1).foo().

Similarly,

object implicits extends SQLImplicits ...

abstract class SQLImplicits ... {
  ...

  implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] = {
    DatasetHolder(_sqlContext.createDataset(rdd))
  }

  implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {
    DatasetHolder(_sqlContext.createDataset(s))
  }
}

case class DatasetHolder[T] private[sql](private val ds: Dataset[T]) {

  def toDS(): Dataset[T] = ds

  def toDF(): DataFrame = ds.toDF()

  def toDF(colNames: String*): DataFrame = ds.toDF(colNames : _*)
}

import spark.sqlContext.implicits._ transforms spark.sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9)).toDF into rddToDatasetHolder(spark.sparkContext.parallelize...).toDF i.e. DatasetHolder(_sqlContext.createDataset(spark.sparkContext.parallelize...)).toDF.

You can read about implicits, extension methods in Scala

https://stackoverflow.com/questions/10375633/understanding-implicit-in-scala

https://stackoverflow.com/questions/5598085/where-does-scala-look-for-implicits

https://stackoverflow.com/questions/65844327/understand-scala-implicit-classes

https://docs.scala-lang.org/overviews/core/implicit-classes.html

https://docs.scala-lang.org/scala3/book/ca-extension-methods.html

https://docs.scala-lang.org/scala3/reference/contextual/extension-methods.html

https://stackoverflow.com/questions/76033008/how-extend-a-class-is-diff-from-implicit-class


About spark.implicits._

https://stackoverflow.com/questions/39151189/importing-spark-implicits-in-scala

https://stackoverflow.com/questions/50878224/what-is-imported-with-spark-implicits

https://stackoverflow.com/questions/50984326/import-implicit-conversions-without-instance-of-sparksession

https://stackoverflow.com/questions/45724290/workaround-for-importing-spark-implicits-everywhere

https://stackoverflow.com/questions/55197905/why-is-spark-implicits-is-embedded-just-before-converting-any-rdd-to-ds-and-no

huangapple
  • 本文由 发表于 2023年4月17日 18:55:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76034393.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定