英文:
PySpark using OR operator in filter
问题
这个过滤器有效:
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego'))
然而,当我扩展到包括其他城市时:
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego') || array_contains(col("country.state.city"), 'Sacramento') || array_contains(col("country.state.city"), 'Los Angeles'))
我收到了SyntaxError: invalid syntax
错误。
我还尝试过:
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego' || 'Sacramento' || 'Los Angeles'))
但这也返回了SyntaxError: invalid syntax
错误。
在Spark中正确使用OR运算符来过滤加利福尼亚城市数据的方法是什么?
英文:
I have an array that I am indexing to filter for data from cities of California.
This filter works:
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego'))
However, when I expand to include other cities:
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego') || array_contains(col("country.state.city"), 'Sacramento') || array_contains(col("country.state.city"), 'Los Angeles'))
I get SyntaxError: invalid syntax
I have also tried
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego' || 'Sacramento' || 'Los Angeles'))
but this also returns SyntaxError: invalid syntax
What is the correct usage of the OR operator in Spark to filter data from Californian cities?
答案1
得分: 0
逻辑或使用一个竖直的竖杠(|
)。
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego') | array_contains(col("country.state.city"), 'Sacramento') | array_contains(col("country.state.city"), 'Los Angeles'))
英文:
Logical OR uses a single vertical bar (|
).
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego') | array_contains(col("country.state.city"), 'Sacramento') | array_contains(col("country.state.city"), 'Los Angeles'))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论