如何在Pyspark DataFrame中选择日期范围

huangapple go评论74阅读模式
英文:

How to select a date range in pyspark dataframe

问题

我想选择包含2022年及以后日期的数据框的一部分,这可能包括(今天、明天和后天)。我该如何实现这个目标?

df = df.filter(col("sales_date").contains("2022"))

英文:

I want to select a portion of my dataframe with dates containing 2022 up to latest date and that may include (today and tomorrow and next). How can I achieve that?

df= df.filter(col("sales_date").contains("2022"))

答案1

得分: 1

你可以使用between函数或者'>'操作符:

df = df.filter(col("date").between("2022-01-01", "2022-12-31"))

或者

df = df.filter(col("date") > "2022-01-01")
英文:

You can use between function or even '>'

df= df.filter(col("date").between("2022-01-01", "2022-12-31"))

or

df= df.filter(col("date") > "2022-01-01")

答案2

得分: 0

'between'语法如前所述将起作用,只需确保将列转换为正确的格式:https://sparkbyexamples.com/spark/spark-convert-string-to-timestamp-format/

英文:

As mentioned about, 'between' syntax will do the trick, just make sure your column is converted in a proper format: https://sparkbyexamples.com/spark/spark-convert-string-to-timestamp-format/

答案3

得分: 0

你可以在筛选器中使用 like,其中 % 作为通配符字符。

scala> var df = Seq(("2022-01-01"),("2021-02-01")).toDF
df: org.apache.spark.sql.DataFrame = [value: string]

scala> df = df.withColumn("date",col("value").cast("date"))
df: org.apache.spark.sql.DataFrame = [value: string, date: date]

scala> df.printSchema
root
 |-- value: string (nullable = true)
 |-- date: date (nullable = true)

scala> df.show()
+----------+----------+
|     value|      date|
+----------+----------+
|2022-01-01|2022-01-01|
|2021-02-01|2021-02-01|
+----------+----------+

scala> df.filter(col("date").like("2022%")).show()
+----------+----------+
|     value|      date|
+----------+----------+
|2022-01-01|2022-01-01|
+----------+----------+
英文:

you can use like in filter where in % works as wild card char.

scala> var df = Seq(("2022-01-01"),("2021-02-01")).toDF
df: org.apache.spark.sql.DataFrame = [value: string]

scala> df = df.withColumn("date",col("value").cast("date"))
df: org.apache.spark.sql.DataFrame = [value: string, date: date]

scala> df.printSchema
root
|-- value: string (nullable = true)
|-- date: date (nullable = true)

scala> df.show()
+----------+----------+
|     value|      date|
+----------+----------+
|2022-01-01|2022-01-01|
|2021-02-01|2021-02-01|
+----------+----------+


scala> df.filter(col("date").like("2022%")).show()
+----------+----------+
|     value|      date|
+----------+----------+
|2022-01-01|2022-01-01|
+----------+----------+

huangapple
  • 本文由 发表于 2023年1月9日 16:55:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75054963.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定