英文:
How to include multiple expression in a case when statement with databricks pyspark
问题
The error you're encountering is because the withColumn
function in PySpark only accepts two arguments: the column name and the expression. To add multiple case statements, you should use the expr
function to create a single expression that combines both cases. Here's the corrected code:
%python
from pyspark.sql.functions import expr
df = sql("select * from retailrpt.vw_fund_managers")
transformWithCol = df.withColumn("MyTestName", expr("case when first_name = 'Peter' then 1 when last_name = 'Jones' then 5 else 0 end"))
This code combines both case statements into a single expression within the expr
function, allowing you to achieve your desired result.
英文:
The following case when pyspark code works fine when adding a single case when expr
%python
from pyspark.sql.functions import expr
df = sql("select * from xxxxxxx.xxxxxxx")
transfromWithCol = (df.withColumn("MyTestName", expr("case when first_name = 'Peter' then 1 else 0 end")))
However, I would like to add another case when statement to the same withColumn as follows:
%python
from pyspark.sql.functions import expr
df = sql("""select * from retailrpt.vw_fund_managers""")
transfromWithCol = (df
.withColumn("MyTestName", expr("case when first_name = 'Peter' then 1 else 0 end"), expr("case when last_name = 'Jones' then 5 else 4 end")))
I get the error:
TypeError: withColumn() takes 3 positional arguments but 4 were given
Is it not possible to add multiple case statements to a withColumn?
答案1
得分: 1
- 要提供多个条件,您可以使用以下方式中的表达式。以下是我的数据框:
data = [[1, 'Ana', 'Anathan'], [2, 'Topson', 'Topias'], [3, 'Ceb', 'Seb']]
df = spark.createDataFrame(data=data, schema=['id', 'gname', 'aname'])
df.show()
- 当我尝试在我的数据框上使用类似的代码时,我收到了相同的错误:
from pyspark.sql.functions import expr
df1 = (df.withColumn("MyTestName", expr("case when gname = 'Ana' then 1 else 0 end"), expr("case when aname = 'Seb' then 5 else 4 end")))
df1.show()
- 现在,使用以下代码获取多个条件。您可以编写多个条件,但不能有多个else条件:
from pyspark.sql.functions import expr
df1 = df.withColumn("MyTestName", expr("case when gname = 'Ana' then 1 when aname='Seb' then 2 else 0 end"))
df1.show()
英文:
- To give multiple conditions you can use the the expr in the following way. The following is my dataframe:
data = [[1,'Ana','Anathan'],[2,'Topson','Topias'],[3,'Ceb','Seb']]
df = spark.createDataFrame(data=data,schema=['id','gname','aname'])
df.show()
- I got the same error when I tried to use similar code on my dataframe:
from pyspark.sql.functions import expr
df1 = (df.withColumn("MyTestName", expr("case when gname = 'Ana' then 1 else 0 end"), expr("case when aname = 'Seb' then 5 else 4 end")))
df1.show()
- Now, use the following code to get multiple conditions. You will be able to write multiple conditions but not multiple else conditions:
from pyspark.sql.functions import expr
df1 = df.withColumn("MyTestName", expr("case when gname = 'Ana' then 1 when aname='Seb' then 2 else 0 end"))
df1.show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论