如何在PySpark中打印current_date()的结果?

huangapple go评论105阅读模式
英文:

How to print the result of current_date() in PySpark?

问题

这是在Python中非常简单的,但我目前正在学习在Databricks中使用PySpark。

我只想看看在PySpark中current_date()返回什么。

我尝试过的内容:

from pyspark.sql import functions as fn

print(fn.current_date())
# 结果:Column<'current_date()'>

fn.current_date()
# 结果:Out[35]: Column<'current_date()'>

fn.first(fn.current_date())
# 结果:Out[36]: Column<'first(current_date())'>

fn.current_date()[0]
# 结果:Out[37]: Column<'current_date()[0]'>

display(fn.current_date())
# 结果:Column<'current_date()'>

这是否完全不可能?

英文:

This is very simple in python, but I am currently learning PySpark in Databricks.

I just want to see what is returned by current_date() in PySpark.

What I have tried:

from pyspark.sql import functions as fn

print(fn.current_date())
# Result: Column<'current_date()'>

fn.current_date()
# Result: Out[35]: Column<'current_date()'>

fn.first(fn.current_date())
# Result: Out[36]: Column<'first(current_date())'>

fn.current_date()[0]
# Result: Out[37]: Column<'current_date()[0]'>

display(fn.current_date())
# Result: Column<'current_date()'>

Is it just not possible?

答案1

得分: 1

你可以使用 spark.sql() 来处理这个情况。

示例:

print(spark.sql("select string(current_date())").collect()[0][0])
#2023-08-04
英文:

You can use spark.sql() for this case.

Example:

print(spark.sql("select string(current_date())").collect()[0][0])
#2023-08-04

答案2

得分: 0

在Spark中,列表达式(例如current_date())在将它们放入数据框作为列并要求显示数据框之前不会显示结果。

考虑以下示例:


spark.range(1) - 创建一个数据框
.select(F.current_date()) - 选择使用函数current_date创建的列
.show() - 打印数据框

from pyspark.sql import functions as F

spark.range(1).select(F.current_date()).show()
# +--------------+
# |current_date()|
# +--------------+
# |    2023-08-04|
# +--------------+

spark.sql("select current_date()") - 使用SQL表达式创建数据框和列
.show() - 打印数据框

spark.sql("select current_date()").show()
# +--------------+
# |current_date()|
# +--------------+
# |    2023-08-04|
# +--------------+

.head() - 访问数据框的第一行(作为pyspark.sql.types.Row对象)
[0] - 访问行的第一个元素("列")

spark.sql("select current_date()").head()[0]
# datetime.date(2023, 8, 4)

在Databricks中,display(df) 也应该有效,但您必须创建df,例如:

display(spark.sql("select current_date()"))
英文:

In Spark, column expressions (e.g. current_date()) do not show results until they are put into dataframes as columns and then the dataframe is instructed to be shown.

Consider the following examples:


spark.range(1) - creating a dataframe
.select(F.current_date()) - selecting a column created using function current_date
.show() - printing the dataframe

from pyspark.sql import functions as F

spark.range(1).select(F.current_date()).show()
# +--------------+
# |current_date()|
# +--------------+
# |    2023-08-04|
# +--------------+

spark.sql("select current_date()") - creating both dataframe and column using SQL expression
.show() - printing the dataframe

spark.sql("select current_date()").show()
# +--------------+
# |current_date()|
# +--------------+
# |    2023-08-04|
# +--------------+

.head() - accessing the dataframe's first row (as a pyspark.sql.types.Row object)
[0] - accessing the first element ("column") of the row

spark.sql("select current_date()").head()[0]
# datetime.date(2023, 8, 4)

In Databricks, display(df) should also work, but for this you must create the df, e.g.:

display(spark.sql("select current_date()"))

huangapple
  • 本文由 发表于 2023年8月4日 21:33:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76836435.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定