Pyspark UDF 评估

huangapple go评论64阅读模式
英文:

Pyspark UDF evaluation

问题

以下是您要翻译的代码部分:

So I have a simple function which takes in two strings and converts them into float(consider it is always possible) and returns the max of them.

def val_estimate(amount_1: str, amount_2: str) -> float:

   return max(float(amount_1), float(amount_2))

When I evaluate the function on the following arguments, I get the output as expected:

val_estimate("2000000","90125900")

Output:
90125900.0

Now, when I register the function above as a UDF and use it on a spark dataframe with same arguments, I get the following results.

val_estimate_udf = F.udf(val_estimate, returnType = FloatType())

df = spark.createDataFrame( [["2000000","90125900"]], ['sale_amt', 'total_value'])

df = df.withColumn("check",val_estimate_udf(F.col("sale_amt"),F.col("total_value")))
display(df)

Output:

|     sale_amt        |     total_value  |  check |
|---------------------|------------------|--------|
|          2000000         |         90125900       |  90125904 |

希望这对您有所帮助。如果您有任何其他问题或需要进一步的帮助,请随时提出。

英文:

So I have a simple function which takes in two strings and converts them into float(consider it is always possible) and returns the max of them.

def val_estimate(amount_1: str, amount_2: str) -> float:

   return max(float(amount_1), float(amount_2))

When I evaluate the function on the following arguments, I get the output as expected:

val_estimate("2000000","90125900")

Output: 
90125900.0

Now, when I register the function above as a UDF and use it on a spark dataframe with same arguments, I get the following results.

val_estimate_udf = F.udf(val_estimate, returnType = FloatType())

df = spark.createDataFrame( [["2000000","90125900"]], ['sale_amt', 'total_value'])

df = df.withColumn("check",val_estimate_udf(F.col("sale_amt"),F.col("total_value")))
display(df)

Output:

sale_amt total_value check
2000000 90125900 90125904

Why am I getting this result? Please ignore lack of error handling, etc. and the fact that I can use native spark function to do the same, but I can't understand this result.

答案1

得分: 1

import pyspark.sql.functions as F
from pyspark.sql.types import DoubleType

@udf(returnType=DoubleType()) 
def val_estimate(amount_1: str, amount_2: str) -> float:
   return max(float(amount_1), float(amount_2))
  
df = spark.createDataFrame([('2000000', '90125900')], ['sale_amt', 'total_value'])
df2 = df.withColumn("check", val_estimate(F.col("sale_amt"), F.col("total_value")))
df2.show()
英文:
import pyspark.sql.functions as F
from pyspark.sql.types import  DoubleType

@udf(returnType = DoubleType()) 
def val_estimate(amount_1: str, amount_2: str) -> float:
   return max(float(amount_1), float(amount_2))
  
df = spark.createDataFrame( [('2000000', '90125900')], ['sale_amt', 'total_value'])
df2 = df.withColumn("check", val_estimate(F.col("sale_amt"), F.col("total_value")))
df2.show()

huangapple
  • 本文由 发表于 2023年2月27日 19:08:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75579686.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定