Spark Scala Dataframe中的`case when`类似函数

huangapple go评论85阅读模式
英文:

Spark Scala Dataframe case when like function

问题

我正在使用Spark Scala的DataFrame API,尝试转换以下SQL逻辑:

  1. CASE
  2. WHEN col_1 like '%XYZ' OR col_1 like '%ZYX' THEN
  3. CASE WHEN col_2 like '%TTT' THEN 'ABC' ELSE 'BBA' END
  4. WHEN col_1 not like '%XYZ' OR col_1 not like '%ZYX'
  5. CASE WHEN col_2 like '%YYY' THEN 'BBC' END
  6. END as new_col

如何在Spark Scala DataFrame API中构建具有多个likenot like条件的CASE WHEN语句?

英文:

I am using spark scala with DataFrame API, trying to convert the below sql logic

  1. CASE
  2. WHEN col_1 like '%XYZ' OR col_1 like '%ZYX' THEN
  3. CASE WHEN col_2 like '%TTT' THEN 'ABC' ELSE 'BBA' END
  4. WHEN col_1 not like '%XYZ' OR col_1 not like '%ZYX'
  5. CASE WHEN col_2 like '%YYY' THEN BBC' END
  6. END as new_col

How to construct CASE WHEN with multiple like and not like conditions with spark scala dataframe api?

答案1

得分: 1

使用expr函数,并将整个CASE语句传递给它,如下所示:

  1. import org.apache.spark.sql.functions._
  2. val df = Seq(
  3. ("A", "01/01/2022", 1), ("AXYZ", "02/01/2022", 1), ("AZYX", "03/01/2022", 1), ("AXYZ", "04/01/2022", 0), ("AZYX", "05/01/2022", 0), ("AB", "06/01/2022", 1), ("A", "07/01/2022", 0)
  4. ).toDF("Category", "date", "Indictor")
  5. df.select(col("*"), expr("""
  6. CASE WHEN Category like '%XYZ' OR Category like '%ZYX' THEN
  7. CASE WHEN Indictor = 1 THEN 'ABC' ELSE 'BBA' END
  8. WHEN Category not like '%XYZ' OR Category not like '%ZYX' THEN
  9. CASE WHEN Indictor = 1 THEN 'BBC' ELSE 'BBD' END
  10. END
  11. """).alias("new_col")).show()
  1. +--------+----------+--------+-------+
  2. |Category| date|Indictor|new_col|
  3. +--------+----------+--------+-------+
  4. | A|01/01/2022| 1| BBC|
  5. | AXYZ|02/01/2022| 1| ABC|
  6. | AZYX|03/01/2022| 1| ABC|
  7. | AXYZ|04/01/2022| 0| BBA|
  8. | AZYX|05/01/2022| 0| BBA|
  9. | AB|06/01/2022| 1| BBC|
  10. | A|07/01/2022| 0| BBD|
  11. +--------+----------+--------+-------+
英文:

Use the expr function and pass the whole case statement in it as below.

  1. import org.apache.spark.sql.functions._
  2. val df=Seq(
  3. ("A","01/01/2022",1), ("AXYZ","02/01/2022",1), ("AZYX","03/01/2022",1),("AXYZ","04/01/2022",0), ("AZYX","05/01/2022",0),("AB","06/01/2022",1), ("A","07/01/2022",0) ).toDF("Category", "date", "Indictor")
  4. df.select(col("*"),expr("""CASE WHEN Category like '%XYZ' OR Category like '%ZYX' THEN
  5. CASE WHEN Indictor = 1 THEN 'ABC' ELSE 'BBA' END
  6. WHEN Category not like '%XYZ' OR Category not like '%ZYX' then
  7. CASE WHEN Indictor = 1 THEN 'BBC' ELSE 'BBD' END
  8. END""").alias("new_col")).show()
  1. +--------+----------+--------+-------+
  2. |Category| date|Indictor|new_col|
  3. +--------+----------+--------+-------+
  4. | A|01/01/2022| 1| BBC|
  5. | AXYZ|02/01/2022| 1| ABC|
  6. | AZYX|03/01/2022| 1| ABC|
  7. | AXYZ|04/01/2022| 0| BBA|
  8. | AZYX|05/01/2022| 0| BBA|
  9. | AB|06/01/2022| 1| BBC|
  10. | A|07/01/2022| 0| BBD|
  11. +--------+----------+--------+-------+

huangapple
  • 本文由 发表于 2023年2月16日 03:04:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464385.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定