英文:
How to select a date range in pyspark dataframe
问题
我想选择包含2022年及以后日期的数据框的一部分,这可能包括(今天、明天和后天)。我该如何实现这个目标?
df = df.filter(col("sales_date").contains("2022"))
英文:
I want to select a portion of my dataframe with dates containing 2022 up to latest date and that may include (today and tomorrow and next). How can I achieve that?
df= df.filter(col("sales_date").contains("2022"))
答案1
得分: 1
你可以使用between函数或者'>'操作符:
df = df.filter(col("date").between("2022-01-01", "2022-12-31"))
或者
df = df.filter(col("date") > "2022-01-01")
英文:
You can use between function or even '>'
df= df.filter(col("date").between("2022-01-01", "2022-12-31"))
or
df= df.filter(col("date") > "2022-01-01")
答案2
得分: 0
'between'语法如前所述将起作用,只需确保将列转换为正确的格式:https://sparkbyexamples.com/spark/spark-convert-string-to-timestamp-format/
英文:
As mentioned about, 'between' syntax will do the trick, just make sure your column is converted in a proper format: https://sparkbyexamples.com/spark/spark-convert-string-to-timestamp-format/
答案3
得分: 0
你可以在筛选器中使用 like
,其中 %
作为通配符字符。
scala> var df = Seq(("2022-01-01"),("2021-02-01")).toDF
df: org.apache.spark.sql.DataFrame = [value: string]
scala> df = df.withColumn("date",col("value").cast("date"))
df: org.apache.spark.sql.DataFrame = [value: string, date: date]
scala> df.printSchema
root
|-- value: string (nullable = true)
|-- date: date (nullable = true)
scala> df.show()
+----------+----------+
| value| date|
+----------+----------+
|2022-01-01|2022-01-01|
|2021-02-01|2021-02-01|
+----------+----------+
scala> df.filter(col("date").like("2022%")).show()
+----------+----------+
| value| date|
+----------+----------+
|2022-01-01|2022-01-01|
+----------+----------+
英文:
you can use like in filter where in %
works as wild card char.
scala> var df = Seq(("2022-01-01"),("2021-02-01")).toDF
df: org.apache.spark.sql.DataFrame = [value: string]
scala> df = df.withColumn("date",col("value").cast("date"))
df: org.apache.spark.sql.DataFrame = [value: string, date: date]
scala> df.printSchema
root
|-- value: string (nullable = true)
|-- date: date (nullable = true)
scala> df.show()
+----------+----------+
| value| date|
+----------+----------+
|2022-01-01|2022-01-01|
|2021-02-01|2021-02-01|
+----------+----------+
scala> df.filter(col("date").like("2022%")).show()
+----------+----------+
| value| date|
+----------+----------+
|2022-01-01|2022-01-01|
+----------+----------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论