英文:
Java Spark withColumn - custom function
问题
问题,请在Java中提供任何解决方案(不要用Scala或Python)
我有一个包含以下数据的DataFrame
colA, colB
23, 44
24, 64
我想要的是一个类似这样的DataFrame
colA, colB, colC
23, 44, myFunction(23, 24)的结果
24, 64, myFunction(23, 24)的结果
基本上,我想在Java中向DataFrame添加一列,新列的值是通过将colA和colB的值传递给一个复杂函数(返回一个字符串)来获得的。
以下是我尝试过的,但是传递给complexFunction的参数似乎只是名称“colA”,而不是colA中的值。
myDataFrame.withColumn("colC", complexFunction(myDataFrame.col("colA"))).show();
英文:
Problem, please give any solutions in Java(not scala or python)
I have a DataFrame with the following data
colA, colB
23,44
24,64
What i want is a dataframe like this
colA, colB, colC
23,44, result of myFunction(23,24)
24,64, result of myFunction(23,24)
Basically i would like to add a column to the dataframe in java, where the value of the new column is found by putting the values of colA and colB through a complex function which returns a string.
Here is what i've tried, but the parameter to complexFunction only seems to be the name 'colA', rather than the value in colA.
myDataFrame.withColumn("ststs", (complexFunction(myDataFrame.col("colA")))).show();
答案1
得分: 1
按照评论中的建议,您应该使用用户定义函数。
假设您有一个名为myFunction的方法,该方法执行复杂的处理:
val myFunction: (Int, Int) => String = (colA, colB) => {...}
然后,您只需要将您的函数转换为一个UDF,并将其应用于A和B列:
import org.apache.spark.sql.functions.{udf, col}
val myFunctionUdf = udf(myFunction)
myDataFrame.withColumn("colC", myFunctionUdf(col("colA"), col("colB")))
希望对您有所帮助。
英文:
As suggested in the comments, you should use a User Defined Function.
Let's suppose that you have a myFunction method which does the complex processing :
val myFunction : (Int, Int) => String = (colA, colB) => {...}
Then All you need to do is to transform your function into a udf and apply it on the columns A and B :
import org.apache.spark.sql.functions.{udf, col}
val myFunctionUdf = udf(myFunction)
myDataFrame.withColumn("colC", myFunctionUdf(col("colA"), col("colB")))
I hope it helps
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论