英文:
Pyspark condition on date column
问题
我有一个包含两个日期列start_date和end_date的pyspark数据框。
现在我想从df中获取start_date < today < end_date的数据。
我正在按以下方式操作 -
今天 = datetime.date.today()
df = df.filter((F.col("start_date") < today) & (F.col("end_date") > today))
然而,它会抛出错误,因为“and”条件在日期类型上不起作用。
提前感谢。
英文:
I have a pyspark dataframe with two date columns start_date and end_date.
Now I want to get data from df where start_date < today < end_date
I am doing as below -
today = datetime.date.today()
df=df.filter(C("start_date")<today & C("end_date")>today))
However it throws error as "and" condition is not working on date type.
Thank in advance.
答案1
得分: 1
Here's the code with the requested modifications:
看起来你的`闭合括号`放得不对。
请尝试以下语法。
```python
from pyspark.sql.functions import *
import datetime
df = spark.createDataFrame([('2023-07-23', '2023-07-25'), ('2023-07-24', '2023-07-24')], ['start_date', 'end_date'])
today = datetime.date.today()
df = df.filter((col('start_date') < today) & (col('end_date') > today))
df.show(10, False)
# +----------+----------+
# |start_date|end_date |
# +----------+----------+
# |2023-07-23|2023-07-25|
# +----------+----------+
这是带有所请求的修改的代码。
英文:
Seems your closing brackets
are not in right place.
Try with below syntax.
from pyspark.sql.functions import *
import datetime
df = spark.createDataFrame([('2023-07-23','2023-07-25'), ('2023-07-24','2023-07-24')], ("start_Date", "end_date"))
today = datetime.date.today()
df=df.filter((col("start_date")< today) & (col("end_date")>today))
df.show(10,False)
#+----------+----------+
#|start_Date|end_date |
#+----------+----------+
#|2023-07-23|2023-07-25|
#+----------+----------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论