Spark Scala Dataframe中的`case when`类似函数

huangapple go评论59阅读模式
英文:

Spark Scala Dataframe case when like function

问题

我正在使用Spark Scala的DataFrame API,尝试转换以下SQL逻辑:

   CASE 
     WHEN col_1 like '%XYZ' OR col_1 like '%ZYX' THEN 
       CASE WHEN col_2 like '%TTT' THEN 'ABC' ELSE 'BBA' END
     WHEN col_1 not like '%XYZ' OR col_1 not like '%ZYX' 
       CASE WHEN col_2 like '%YYY' THEN 'BBC' END
   END as new_col 

如何在Spark Scala DataFrame API中构建具有多个likenot like条件的CASE WHEN语句?

英文:

I am using spark scala with DataFrame API, trying to convert the below sql logic

   CASE 
     WHEN col_1 like '%XYZ' OR col_1 like '%ZYX' THEN 
       CASE WHEN col_2 like '%TTT' THEN 'ABC' ELSE 'BBA' END
     WHEN col_1 not like '%XYZ' OR col_1 not like '%ZYX' 
       CASE WHEN col_2 like '%YYY' THEN BBC' END
   END as new_col 

How to construct CASE WHEN with multiple like and not like conditions with spark scala dataframe api?

答案1

得分: 1

使用expr函数,并将整个CASE语句传递给它,如下所示:

import org.apache.spark.sql.functions._
val df = Seq(
  ("A", "01/01/2022", 1), ("AXYZ", "02/01/2022", 1), ("AZYX", "03/01/2022", 1), ("AXYZ", "04/01/2022", 0), ("AZYX", "05/01/2022", 0), ("AB", "06/01/2022", 1), ("A", "07/01/2022", 0)
).toDF("Category", "date", "Indictor")

df.select(col("*"), expr("""
CASE WHEN Category like '%XYZ' OR Category like '%ZYX' THEN 
   CASE WHEN Indictor = 1 THEN 'ABC' ELSE 'BBA' END
WHEN Category not like '%XYZ' OR Category not like '%ZYX' THEN
   CASE WHEN Indictor = 1 THEN 'BBC' ELSE 'BBD' END
END
""").alias("new_col")).show()
+--------+----------+--------+-------+
|Category|      date|Indictor|new_col|
+--------+----------+--------+-------+
|       A|01/01/2022|       1|    BBC|
|    AXYZ|02/01/2022|       1|    ABC|
|    AZYX|03/01/2022|       1|    ABC|
|    AXYZ|04/01/2022|       0|    BBA|
|    AZYX|05/01/2022|       0|    BBA|
|      AB|06/01/2022|       1|    BBC|
|       A|07/01/2022|       0|    BBD|
+--------+----------+--------+-------+
英文:

Use the expr function and pass the whole case statement in it as below.

import org.apache.spark.sql.functions._
val df=Seq(
      ("A","01/01/2022",1), ("AXYZ","02/01/2022",1), ("AZYX","03/01/2022",1),("AXYZ","04/01/2022",0), ("AZYX","05/01/2022",0),("AB","06/01/2022",1), ("A","07/01/2022",0) ).toDF("Category", "date", "Indictor")


df.select(col("*"),expr("""CASE WHEN Category like '%XYZ' OR Category like '%ZYX' THEN 
                                   CASE WHEN Indictor = 1 THEN 'ABC' ELSE 'BBA' END
                                WHEN Category not like '%XYZ' OR Category not like '%ZYX' then
                                    CASE WHEN Indictor = 1 THEN 'BBC' ELSE 'BBD' END
                                END""").alias("new_col")).show()
+--------+----------+--------+-------+
|Category|      date|Indictor|new_col|
+--------+----------+--------+-------+
|       A|01/01/2022|       1|    BBC|
|    AXYZ|02/01/2022|       1|    ABC|
|    AZYX|03/01/2022|       1|    ABC|
|    AXYZ|04/01/2022|       0|    BBA|
|    AZYX|05/01/2022|       0|    BBA|
|      AB|06/01/2022|       1|    BBC|
|       A|07/01/2022|       0|    BBD|
+--------+----------+--------+-------+

huangapple
  • 本文由 发表于 2023年2月16日 03:04:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464385.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定