如何在 Spark 数据框中使用 when 和 Otherwise 语句根据布尔列?

huangapple go评论68阅读模式
英文:

How to use when and Otherwise statement for a Spark dataframe by boolean columns?

问题

我有一个包含三列的数据集,列1:country(字符串),列2:threshold_1(布尔值),列3:threshold_2(布尔值)

我试图根据以下逻辑创建一个新列,但出现错误

我正在使用Palantir代码工作簿进行操作,有人可以告诉我这里缺少什么吗?

df = df.withColumn("Threshold_Filter", 
        when((df["country"] == "INDIA") & (df["threshold_1"] == True) | (df["threshold_2"] == True), "Ind_country"
     ).otherwise("Dif_country"))

请注意,我只翻译了代码和与之相关的内容。

英文:

I have a dataset with three columns, col 1: country (String), col 2: threshold_1 (bool), col 3: threshold_2 (bool)

I am trying to create a new column with this logic, but getting an error

I am using the Palantir code workbook for this, can anyone tell me what I am missing here?

df = df.withColumn("Threshold_Filter", 
        when(df["country"]=="INDIA" & df["threshold_1"]==True | df["threshold_2 "]==True, "Ind_country"
     ).otherwise("Dif_country"))

答案1

得分: 2

df = (
df
.withColumn(
"Threshold_Filter",
when(
(df["country"] == "印度") &
(df["threshold_1"] == True) |
(df["threshold_2"] == True),
"印度国家"
)
.otherwise("其他国家")
)
)

英文:

You just need to put your statements in parentheses.

df = (
    df
    .withColumn(
        "Threshold_Filter",
        when(
            (df["country"]=="INDIA") & 
            (df["threshold_1"]==True) | 
            (df["threshold_2 "]==True), 
            "Ind_country")
        .otherwise("Dif_country"))
)

huangapple
  • 本文由 发表于 2023年1月9日 16:52:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75054931.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定