英文:
Spark Scala Dataframe case when like function
问题
我正在使用Spark Scala的DataFrame API,尝试转换以下SQL逻辑:
CASE
WHEN col_1 like '%XYZ' OR col_1 like '%ZYX' THEN
CASE WHEN col_2 like '%TTT' THEN 'ABC' ELSE 'BBA' END
WHEN col_1 not like '%XYZ' OR col_1 not like '%ZYX'
CASE WHEN col_2 like '%YYY' THEN 'BBC' END
END as new_col
如何在Spark Scala DataFrame API中构建具有多个like
和not like
条件的CASE WHEN
语句?
英文:
I am using spark scala with DataFrame API, trying to convert the below sql logic
CASE
WHEN col_1 like '%XYZ' OR col_1 like '%ZYX' THEN
CASE WHEN col_2 like '%TTT' THEN 'ABC' ELSE 'BBA' END
WHEN col_1 not like '%XYZ' OR col_1 not like '%ZYX'
CASE WHEN col_2 like '%YYY' THEN BBC' END
END as new_col
How to construct CASE WHEN with multiple like and not like conditions with spark scala dataframe api?
答案1
得分: 1
使用expr函数,并将整个CASE语句传递给它,如下所示:
import org.apache.spark.sql.functions._
val df = Seq(
("A", "01/01/2022", 1), ("AXYZ", "02/01/2022", 1), ("AZYX", "03/01/2022", 1), ("AXYZ", "04/01/2022", 0), ("AZYX", "05/01/2022", 0), ("AB", "06/01/2022", 1), ("A", "07/01/2022", 0)
).toDF("Category", "date", "Indictor")
df.select(col("*"), expr("""
CASE WHEN Category like '%XYZ' OR Category like '%ZYX' THEN
CASE WHEN Indictor = 1 THEN 'ABC' ELSE 'BBA' END
WHEN Category not like '%XYZ' OR Category not like '%ZYX' THEN
CASE WHEN Indictor = 1 THEN 'BBC' ELSE 'BBD' END
END
""").alias("new_col")).show()
+--------+----------+--------+-------+
|Category| date|Indictor|new_col|
+--------+----------+--------+-------+
| A|01/01/2022| 1| BBC|
| AXYZ|02/01/2022| 1| ABC|
| AZYX|03/01/2022| 1| ABC|
| AXYZ|04/01/2022| 0| BBA|
| AZYX|05/01/2022| 0| BBA|
| AB|06/01/2022| 1| BBC|
| A|07/01/2022| 0| BBD|
+--------+----------+--------+-------+
英文:
Use the expr function and pass the whole case statement in it as below.
import org.apache.spark.sql.functions._
val df=Seq(
("A","01/01/2022",1), ("AXYZ","02/01/2022",1), ("AZYX","03/01/2022",1),("AXYZ","04/01/2022",0), ("AZYX","05/01/2022",0),("AB","06/01/2022",1), ("A","07/01/2022",0) ).toDF("Category", "date", "Indictor")
df.select(col("*"),expr("""CASE WHEN Category like '%XYZ' OR Category like '%ZYX' THEN
CASE WHEN Indictor = 1 THEN 'ABC' ELSE 'BBA' END
WHEN Category not like '%XYZ' OR Category not like '%ZYX' then
CASE WHEN Indictor = 1 THEN 'BBC' ELSE 'BBD' END
END""").alias("new_col")).show()
+--------+----------+--------+-------+
|Category| date|Indictor|new_col|
+--------+----------+--------+-------+
| A|01/01/2022| 1| BBC|
| AXYZ|02/01/2022| 1| ABC|
| AZYX|03/01/2022| 1| ABC|
| AXYZ|04/01/2022| 0| BBA|
| AZYX|05/01/2022| 0| BBA|
| AB|06/01/2022| 1| BBC|
| A|07/01/2022| 0| BBD|
+--------+----------+--------+-------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论