How to Iterate though scala dataframe rows and store the column name in variables which can be used for some opertions inside for loop?

huangapple go评论48阅读模式
英文:

How to Iterate though scala dataframe rows and store the column name in variables which can be used for some opertions inside for loop?

问题

需要理解如何使用for循环遍历Scala DataFrame并在for循环内执行一些操作。我可以使用以下代码进行遍历,但无法执行其他操作,比如将列值存储在变量中或调用另一个函数。您可以帮助将列值存储在变量中吗?

import spark.implicits._
import org.apache.spark.sql._
case class cls_Employee(name:String, sector:String, age:Int);
val df = Seq(cls_Employee("Andy","aaa", 20), cls_Employee("Berta","bbb", 30), cls_Employee("Joe","ccc", 40)).toDF()
df.as[cls_Employee].take(df.count.toInt).foreach(t => 
{t.name}
)
英文:

Need to understand , how to iterate through scala dataframe using for loop and do some operation inside the for loop. I can iterate using below code but i can not do any other operation like storing the column value in a variable or calling another function. Can you help with storing the column value in a variable.

import spark.implicits._
import org.apache.spark.sql._
case class cls_Employee(name:String, sector:String, age:Int);
val df = Seq(cls_Employee("Andy","aaa", 20), cls_Employee("Berta","bbb", 30), cls_Employee("Joe","ccc", 40)).toDF()
df.as[cls_Employee].take(df.count.toInt).foreach(t => 
{t.name}
)

答案1

得分: 1

你是指这样的吗?

    import spark.implicits._
    import org.apache.spark.sql._

    case class Employee(name:String, sector:String, age:Int)
    val df = Seq(Employee("Andy","aaa", 20), Employee("Berta","bbb", 30), Employee("Joe","ccc", 40)).toDF()

    for (row <- df.as[Employee].collect()) {
      val name = row.name
      println(name)
    }

要小心,因为在这种情况下,由于数据量大,可能会导致驱动程序内存不足。

英文:

You mean something like this?

import spark.implicits._
import org.apache.spark.sql._

case class Employee(name:String, sector:String, age:Int)
val df = Seq(Employee(&quot;Andy&quot;,&quot;aaa&quot;, 20), Employee(&quot;Berta&quot;,&quot;bbb&quot;, 30), Employee(&quot;Joe&quot;,&quot;ccc&quot;, 40)).toDF()

for (row &lt;- df.as[Employee].collect()) {
  val name = row.name
  println(name)
}

be careful because in this case you are doing a collect and in the presence of a lot of data you might get an outofmemory from the driver

huangapple
  • 本文由 发表于 2023年3月31日 18:03:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75897229.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定