Error: 使用PySpark的month和dayofmonth函数在DataFrame布尔表达式中时出现ValueError。

huangapple go评论52阅读模式
英文:

Error: ValueError when using PySpark's month and dayofmonth functions in DataFrame boolean expression

问题

我想编写一个函数,满足以下条件:-
你试图实现的逻辑如下:

  1. 检查sddgl列的月份是否等于12,日期是否等于26。
  2. 如果条件为真,则使用date_add函数将sddgl列增加一天,并更新sddgl列的新值。
  3. 如果条件为假,则保持sddgl列不变。

我尝试了以下方法:-

def sales_logic(sales_data):
    
    if (f.month(sales_data["sddgl"]==12)) & (f.dayofmonth(sales_data["sddgl"]==26)):
        sales_data["sddgl"]=date_add(sales_data["sddgl"],1)
    else:
        sales_data["sddgl"]
    # group_by=sales_data.groupBy("sddcto","sddgl","aiac23","aiac23_udc","schnl","schnl_ecom","bu_schnl","bu_schnl_name",sdco)
    return sales_data

但我遇到了这个错误:- ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

我该如何解决这个问题?

英文:

I want to write a function which will satisfy following condition:-
The logic you are trying to achieve is as follows:

  1. Check if the month of the sddgl column is equal to 12 and the day of the month is equal to 26.
  2. If the condition is true, add one day to the sddgl column using the date_add function and update the sddgl column with the new value.
  3. If the condition is false, leave the sddgl column unchanged.

I tried following :-

def sales_logic(sales_data):

    if (f.month(sales_data["sddgl"]==12)) & (f.dayofmonth(sales_data["sddgl"]==26)):
        sales_data["sddgl"]=date_add(sales_data["sddgl"],1)
    else:
        sales_data["sddgl"]
    # group_by=sales_data.groupBy("sddcto","sddgl","aiac23","aiac23_udc","schnl","schnl_ecom","bu_schnl","bu_schnl_name",sdco)
    return sales_data

But I got this error:- ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

How should i solve it?

答案1

得分: 0

不能在pyspark中像这样分配列。而且,你的大括号使用错误。尝试使用when/otherwise结构:

sales_data = sales_data.withColumn("sddgl", 
    f.when(
        (f.month(sales_data["sddgl"]) == 12) & 
        (f.dayofmonth(sales_data["sddgl"]) == 26), date_add(sales_data["sddgl"], 1)
    ).otherwise(sales_data["sddgl"])
英文:

You cannot assign columns like that in pyspark. Also, your braces are wrong. Try using the when/otherwise construction:

sales_data = sales_data.withColumn("sddgl", 
    f.when(
        (f.month(sales_data["sddgl"])==12) & 
        (f.dayofmonth(sales_data["sddgl"])==26), date_add(sales_data["sddgl"],1)
    ).otherwise(sales_data["sddgl"])

huangapple
  • 本文由 发表于 2023年6月22日 19:56:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76531625.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定