英文:
How to Iterate though scala dataframe rows and store the column name in variables which can be used for some opertions inside for loop?
问题
需要理解如何使用for循环遍历Scala DataFrame并在for循环内执行一些操作。我可以使用以下代码进行遍历,但无法执行其他操作,比如将列值存储在变量中或调用另一个函数。您可以帮助将列值存储在变量中吗?
import spark.implicits._
import org.apache.spark.sql._
case class cls_Employee(name:String, sector:String, age:Int);
val df = Seq(cls_Employee("Andy","aaa", 20), cls_Employee("Berta","bbb", 30), cls_Employee("Joe","ccc", 40)).toDF()
df.as[cls_Employee].take(df.count.toInt).foreach(t =>
{t.name}
)
英文:
Need to understand , how to iterate through scala dataframe using for loop and do some operation inside the for loop. I can iterate using below code but i can not do any other operation like storing the column value in a variable or calling another function. Can you help with storing the column value in a variable.
import spark.implicits._
import org.apache.spark.sql._
case class cls_Employee(name:String, sector:String, age:Int);
val df = Seq(cls_Employee("Andy","aaa", 20), cls_Employee("Berta","bbb", 30), cls_Employee("Joe","ccc", 40)).toDF()
df.as[cls_Employee].take(df.count.toInt).foreach(t =>
{t.name}
)
答案1
得分: 1
你是指这样的吗?
import spark.implicits._
import org.apache.spark.sql._
case class Employee(name:String, sector:String, age:Int)
val df = Seq(Employee("Andy","aaa", 20), Employee("Berta","bbb", 30), Employee("Joe","ccc", 40)).toDF()
for (row <- df.as[Employee].collect()) {
val name = row.name
println(name)
}
要小心,因为在这种情况下,由于数据量大,可能会导致驱动程序内存不足。
英文:
You mean something like this?
import spark.implicits._
import org.apache.spark.sql._
case class Employee(name:String, sector:String, age:Int)
val df = Seq(Employee("Andy","aaa", 20), Employee("Berta","bbb", 30), Employee("Joe","ccc", 40)).toDF()
for (row <- df.as[Employee].collect()) {
val name = row.name
println(name)
}
be careful because in this case you are doing a collect and in the presence of a lot of data you might get an outofmemory from the driver
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论