英文:
Pyspark UDF evaluation
问题
以下是您要翻译的代码部分:
So I have a simple function which takes in two strings and converts them into float(consider it is always possible) and returns the max of them.
def val_estimate(amount_1: str, amount_2: str) -> float:
return max(float(amount_1), float(amount_2))
When I evaluate the function on the following arguments, I get the output as expected:
val_estimate("2000000","90125900")
Output:
90125900.0
Now, when I register the function above as a UDF and use it on a spark dataframe with same arguments, I get the following results.
val_estimate_udf = F.udf(val_estimate, returnType = FloatType())
df = spark.createDataFrame( [["2000000","90125900"]], ['sale_amt', 'total_value'])
df = df.withColumn("check",val_estimate_udf(F.col("sale_amt"),F.col("total_value")))
display(df)
Output:
| sale_amt | total_value | check |
|---------------------|------------------|--------|
| 2000000 | 90125900 | 90125904 |
希望这对您有所帮助。如果您有任何其他问题或需要进一步的帮助,请随时提出。
英文:
So I have a simple function which takes in two strings and converts them into float(consider it is always possible) and returns the max of them.
def val_estimate(amount_1: str, amount_2: str) -> float:
return max(float(amount_1), float(amount_2))
When I evaluate the function on the following arguments, I get the output as expected:
val_estimate("2000000","90125900")
Output:
90125900.0
Now, when I register the function above as a UDF and use it on a spark dataframe with same arguments, I get the following results.
val_estimate_udf = F.udf(val_estimate, returnType = FloatType())
df = spark.createDataFrame( [["2000000","90125900"]], ['sale_amt', 'total_value'])
df = df.withColumn("check",val_estimate_udf(F.col("sale_amt"),F.col("total_value")))
display(df)
Output:
sale_amt | total_value | check |
---|---|---|
2000000 | 90125900 | 90125904 |
Why am I getting this result? Please ignore lack of error handling, etc. and the fact that I can use native spark function to do the same, but I can't understand this result.
答案1
得分: 1
import pyspark.sql.functions as F
from pyspark.sql.types import DoubleType
@udf(returnType=DoubleType())
def val_estimate(amount_1: str, amount_2: str) -> float:
return max(float(amount_1), float(amount_2))
df = spark.createDataFrame([('2000000', '90125900')], ['sale_amt', 'total_value'])
df2 = df.withColumn("check", val_estimate(F.col("sale_amt"), F.col("total_value")))
df2.show()
英文:
import pyspark.sql.functions as F
from pyspark.sql.types import DoubleType
@udf(returnType = DoubleType())
def val_estimate(amount_1: str, amount_2: str) -> float:
return max(float(amount_1), float(amount_2))
df = spark.createDataFrame( [('2000000', '90125900')], ['sale_amt', 'total_value'])
df2 = df.withColumn("check", val_estimate(F.col("sale_amt"), F.col("total_value")))
df2.show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论