英文:
Error: ValueError when using PySpark's month and dayofmonth functions in DataFrame boolean expression
问题
我想编写一个函数,满足以下条件:-
你试图实现的逻辑如下:
- 检查sddgl列的月份是否等于12,日期是否等于26。
- 如果条件为真,则使用date_add函数将sddgl列增加一天,并更新sddgl列的新值。
- 如果条件为假,则保持sddgl列不变。
我尝试了以下方法:-
def sales_logic(sales_data):
if (f.month(sales_data["sddgl"]==12)) & (f.dayofmonth(sales_data["sddgl"]==26)):
sales_data["sddgl"]=date_add(sales_data["sddgl"],1)
else:
sales_data["sddgl"]
# group_by=sales_data.groupBy("sddcto","sddgl","aiac23","aiac23_udc","schnl","schnl_ecom","bu_schnl","bu_schnl_name",sdco)
return sales_data
但我遇到了这个错误:- ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
我该如何解决这个问题?
英文:
I want to write a function which will satisfy following condition:-
The logic you are trying to achieve is as follows:
- Check if the month of the sddgl column is equal to 12 and the day of the month is equal to 26.
- If the condition is true, add one day to the sddgl column using the date_add function and update the sddgl column with the new value.
- If the condition is false, leave the sddgl column unchanged.
I tried following :-
def sales_logic(sales_data):
if (f.month(sales_data["sddgl"]==12)) & (f.dayofmonth(sales_data["sddgl"]==26)):
sales_data["sddgl"]=date_add(sales_data["sddgl"],1)
else:
sales_data["sddgl"]
# group_by=sales_data.groupBy("sddcto","sddgl","aiac23","aiac23_udc","schnl","schnl_ecom","bu_schnl","bu_schnl_name",sdco)
return sales_data
But I got this error:- ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
How should i solve it?
答案1
得分: 0
不能在pyspark中像这样分配列。而且,你的大括号使用错误。尝试使用when/otherwise结构:
sales_data = sales_data.withColumn("sddgl",
f.when(
(f.month(sales_data["sddgl"]) == 12) &
(f.dayofmonth(sales_data["sddgl"]) == 26), date_add(sales_data["sddgl"], 1)
).otherwise(sales_data["sddgl"])
英文:
You cannot assign columns like that in pyspark. Also, your braces are wrong. Try using the when/otherwise construction:
sales_data = sales_data.withColumn("sddgl",
f.when(
(f.month(sales_data["sddgl"])==12) &
(f.dayofmonth(sales_data["sddgl"])==26), date_add(sales_data["sddgl"],1)
).otherwise(sales_data["sddgl"])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论