问题

以下是您要翻译的代码部分：

So I have a simple function which takes in two strings and converts them into float(consider it is always possible) and returns the max of them.

def val_estimate(amount_1: str, amount_2: str) -&gt; float:

   return max(float(amount_1), float(amount_2))

When I evaluate the function on the following arguments, I get the output as expected:

val_estimate(&quot;2000000&quot;,&quot;90125900&quot;)

Output:
90125900.0

Now, when I register the function above as a UDF and use it on a spark dataframe with same arguments, I get the following results.

val_estimate_udf = F.udf(val_estimate, returnType = FloatType())

df = spark.createDataFrame( [[&quot;2000000&quot;,&quot;90125900&quot;]], [&#39;sale_amt&#39;, &#39;total_value&#39;])

df = df.withColumn(&quot;check&quot;,val_estimate_udf(F.col(&quot;sale_amt&quot;),F.col(&quot;total_value&quot;)))
display(df)

Output:

|     sale_amt        |     total_value  |  check |
|---------------------|------------------|--------|
|          2000000         |         90125900       |  90125904 |

希望这对您有所帮助。如果您有任何其他问题或需要进一步的帮助，请随时提出。

英文:

So I have a simple function which takes in two strings and converts them into float(consider it is always possible) and returns the max of them.

def val_estimate(amount_1: str, amount_2: str) -&gt; float:

   return max(float(amount_1), float(amount_2))

When I evaluate the function on the following arguments, I get the output as expected:

val_estimate(&quot;2000000&quot;,&quot;90125900&quot;)

Output: 
90125900.0

Now, when I register the function above as a UDF and use it on a spark dataframe with same arguments, I get the following results.

val_estimate_udf = F.udf(val_estimate, returnType = FloatType())

df = spark.createDataFrame( [[&quot;2000000&quot;,&quot;90125900&quot;]], [&#39;sale_amt&#39;, &#39;total_value&#39;])

df = df.withColumn(&quot;check&quot;,val_estimate_udf(F.col(&quot;sale_amt&quot;),F.col(&quot;total_value&quot;)))
display(df)

Output:

sale_amt	total_value	check
2000000	90125900	90125904

Why am I getting this result? Please ignore lack of error handling, etc. and the fact that I can use native spark function to do the same, but I can't understand this result.

答案1

得分: 1

import pyspark.sql.functions as F
from pyspark.sql.types import DoubleType

@udf(returnType=DoubleType()) 
def val_estimate(amount_1: str, amount_2: str) -> float:
   return max(float(amount_1), float(amount_2))
  
df = spark.createDataFrame([('2000000', '90125900')], ['sale_amt', 'total_value'])
df2 = df.withColumn("check", val_estimate(F.col("sale_amt"), F.col("total_value")))
df2.show()

英文:

import pyspark.sql.functions as F
from pyspark.sql.types import  DoubleType

@udf(returnType = DoubleType()) 
def val_estimate(amount_1: str, amount_2: str) -&gt; float:
   return max(float(amount_1), float(amount_2))
  
df = spark.createDataFrame( [(&#39;2000000&#39;, &#39;90125900&#39;)], [&#39;sale_amt&#39;, &#39;total_value&#39;])
df2 = df.withColumn(&quot;check&quot;, val_estimate(F.col(&quot;sale_amt&quot;), F.col(&quot;total_value&quot;)))
df2.show()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pyspark UDF 评估

问题

答案1

如何拆分JavaDStream<String>并打印行的第二个单词。

获取Apache Spark中单列的值，以Java编写，作为一个扁平列表。

Is it faster to cast within filter() or cast new withColumn(), then filter in Spark?

Spark Scala [嵌套if-else的for循环] 如何避免接收重复数组

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论