英文:
Find the least value in between the columns using Spark DataFrame
问题
我有一个如下所示的数据框,并需要找到除零以外的最小值,并将其添加到一个新列中,列名为'Least'。
Column1 | Column2 | Column3 |
---|---|---|
100.0 | 120.0 | 150.0 |
200.0 | 0.0 | 0.0 |
0.0 | 20.0 | 100.0 |
我尝试使用least()函数,但没有得到预期的输出。
预期的输出应该如下所示。
Column1 | Column2 | Column3 | Least |
---|---|---|---|
100.0 | 120.0 | 150.0 | 100.0 |
200.0 | 0.0 | 0.0 | 200.0 |
0.0 | 20.0 | 100.0 | 20.0 |
英文:
I have a dataframe like below and need to find the least value except zeros and add it in a new column as 'Least'.
Column1 | Column2 | Column3 |
---|---|---|
100.0 | 120.0 | 150.0 |
200.0 | 0.0 | 0.0 |
0.0 | 20.0 | 100.0 |
I tried with least() function but I didn't get the expected output.
expected output would be like below.
Column1 | Column2 | Column3 | Least |
---|---|---|---|
100.0 | 120.0 | 150.0 | 100.0 |
200.0 | 0.0 | 0.0 | 200.0 |
0.0 | 20.0 | 100.0 | 20.0 |
答案1
得分: 0
你可以像这样获取最小值:
import sparkSession.implicits._
val df = List(
(100.0, 120.0, 150.0),
(200.0, 0.0, 0.0),
(0.0, 20.0, 100.0)
).toDF("column1", "column2", "column3")
val columns = df.columns.toSeq
val leastRow = least(
columns.map(col): _*
).alias("min")
df.select($"*", leastRow).show
尝试改进**leastRow**方法以忽略零值。考虑将零值替换为在您的用例中可能的最大浮点值,通常为**Double.PositiveInfinity**等。
不要犹豫发布您的工作,并确保您将获得帮助!
祝你好运!
英文:
You can do something like this to get the least values
import sparkSession.implicits._
val df = List(
(100.0, 120.0, 150.0),
(200.0, 0.0, 0.0),
(0.0, 20.0, 100.0)
).toDF("column1", "column2", "column3")
val columns = df.columns.toSeq
val leastRow = least(
columns map col: _*
).alias("min")
df.select($"*", leastRow).show
Try to improve the leastRow method to ignore the zero values. think about replacing the zero values with the maximum possible float value in your use case, Double.PositiveInfinity in general ect..
Do not hesitate to post your work and be sure that you'll get help !
Good luck.
答案2
得分: 0
以下是翻译好的部分:
我找到了两种解决方案:
- 在计算最小值时,您可以将0交换为无穷大以跳过零值
- 您可以使用用户定义函数(UDF)来跳过零值
选项1)
import org.apache.spark.sql.functions._
val data = Seq((100.0, 120.0, 150.0), (200.0, 0.0, 0.0), (0.0, 20.0, 100.0))
val columns = Seq("Column1", "Column2", "Column3")
val df: DataFrame = spark.createDataFrame(data).toDF(columns: _*)
// 计算最小值,将所有零值替换为正无穷大
val leastValueExpr = least(
df.columns.map(colName =>
when(col(colName) =!= 0.0, col(colName)).otherwise(Double.PositiveInfinity)
): _*
)
val resultDF = df.withColumn("Least", leastValueExpr)
resultDF.show()
选项2
// 定义一个UDF来计算最小值,排除零值
val leastNonZero: UserDefinedFunction =
udf((cols: Seq[Double]) => cols.filter(_ != 0.0).min)
// 应用UDF来计算最小值,排除零值
val dfWithLeast: DataFrame =
df.withColumn("Least", leastNonZero(array(df.columns.map(col): _*)))
// 显示结果
dfWithLeast.show()
输出:
+-------+-------+-------+-----+
|Column1|Column2|Column3|Least|
+-------+-------+-------+-----+
| 100.0| 120.0| 150.0|100.0|
| 200.0| 0.0| 0.0|200.0|
| 0.0| 20.0| 100.0| 20.0|
+-------+-------+-------+-----+
英文:
I found two solutions:
- You can swap 0 to inf when calculating least to skip zeroes
- You can use udf which will skip zeroes
Option 1)
import org.apache.spark.sql.functions._
val data = Seq((100.0, 120.0, 150.0), (200.0, 0.0, 0.0), (0.0, 20.0, 100.0))
val columns = Seq("Column1", "Column2", "Column3")
val df: DataFrame = spark.createDataFrame(data).toDF(columns: _*)
// Calculate the least value, swaping all zeroes to inf
val leastValueExpr = least(
df.columns.map(colName =>
when(col(colName) =!= 0.0, col(colName)).otherwise(Double.PositiveInfinity)
): _*
)
val resultDF = df.withColumn("Least", leastValueExpr)
resultDF.show()
Option 2
// Define a UDF to calculate the least value, excluding zero values
val leastNonZero: UserDefinedFunction =
udf((cols: Seq[Double]) => cols.filter(_ != 0.0).min)
// Apply the UDF to calculate the least value, excluding zero values
val dfWithLeast: DataFrame =
df.withColumn("Least", leastNonZero(array(df.columns.map(col): _*)))
// Show the result
dfWithLeast.show()
Output:
+-------+-------+-------+-----+
|Column1|Column2|Column3|Least|
+-------+-------+-------+-----+
| 100.0| 120.0| 150.0|100.0|
| 200.0| 0.0| 0.0|200.0|
| 0.0| 20.0| 100.0| 20.0|
+-------+-------+-------+-----+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论