如何在Databricks PySpark的case when语句中包含多个表达式?

huangapple go评论111阅读模式
英文:

How to include multiple expression in a case when statement with databricks pyspark

问题

The error you're encountering is because the withColumn function in PySpark only accepts two arguments: the column name and the expression. To add multiple case statements, you should use the expr function to create a single expression that combines both cases. Here's the corrected code:

%python
from pyspark.sql.functions import expr
df = sql("select * from retailrpt.vw_fund_managers")
transformWithCol = df.withColumn("MyTestName", expr("case when first_name = 'Peter' then 1 when last_name = 'Jones' then 5 else 0 end"))

This code combines both case statements into a single expression within the expr function, allowing you to achieve your desired result.

英文:

The following case when pyspark code works fine when adding a single case when expr

%python
from pyspark.sql.functions import expr
df = sql("select * from xxxxxxx.xxxxxxx")
transfromWithCol = (df.withColumn("MyTestName", expr("case when first_name = 'Peter' then 1 else 0 end")))

However, I would like to add another case when statement to the same withColumn as follows:

%python
from pyspark.sql.functions import expr
df = sql("""select * from retailrpt.vw_fund_managers""")
transfromWithCol = (df
                    .withColumn("MyTestName", expr("case when first_name = 'Peter' then 1 else 0 end"), expr("case when last_name = 'Jones' then 5 else 4 end")))

I get the error:

TypeError: withColumn() takes 3 positional arguments but 4 were given

Is it not possible to add multiple case statements to a withColumn?

答案1

得分: 1

  • 要提供多个条件,您可以使用以下方式中的表达式。以下是我的数据框:
data = [[1, 'Ana', 'Anathan'], [2, 'Topson', 'Topias'], [3, 'Ceb', 'Seb']]
df = spark.createDataFrame(data=data, schema=['id', 'gname', 'aname'])
df.show()
  • 当我尝试在我的数据框上使用类似的代码时,我收到了相同的错误:
from pyspark.sql.functions import expr

df1 = (df.withColumn("MyTestName", expr("case when gname = 'Ana' then 1 else 0 end"), expr("case when aname = 'Seb' then 5 else 4 end")))
df1.show()
  • 现在,使用以下代码获取多个条件。您可以编写多个条件,但不能有多个else条件:
from pyspark.sql.functions import expr

df1 = df.withColumn("MyTestName", expr("case when gname = 'Ana' then 1 when aname='Seb' then 2 else 0 end"))
df1.show()
英文:
  • To give multiple conditions you can use the the expr in the following way. The following is my dataframe:
data = [[1,'Ana','Anathan'],[2,'Topson','Topias'],[3,'Ceb','Seb']]
df = spark.createDataFrame(data=data,schema=['id','gname','aname'])
df.show()

如何在Databricks PySpark的case when语句中包含多个表达式?

  • I got the same error when I tried to use similar code on my dataframe:
from pyspark.sql.functions import expr

df1 = (df.withColumn("MyTestName", expr("case  when gname =  'Ana'  then  1  else  0  end"), expr("case  when aname =  'Seb'  then  5  else  4  end")))
df1.show()

如何在Databricks PySpark的case when语句中包含多个表达式?

  • Now, use the following code to get multiple conditions. You will be able to write multiple conditions but not multiple else conditions:
from pyspark.sql.functions import expr

df1 = df.withColumn("MyTestName", expr("case  when gname =  'Ana'  then  1  when aname='Seb'  then  2  else  0  end"))
df1.show()

如何在Databricks PySpark的case when语句中包含多个表达式?

huangapple
  • 本文由 发表于 2023年6月8日 19:17:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76431287.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定