英文:
How to print the result of current_date() in PySpark?
问题
这是在Python中非常简单的,但我目前正在学习在Databricks中使用PySpark。
我只想看看在PySpark中current_date()
返回什么。
我尝试过的内容:
from pyspark.sql import functions as fn
print(fn.current_date())
# 结果:Column<'current_date()'>
fn.current_date()
# 结果:Out[35]: Column<'current_date()'>
fn.first(fn.current_date())
# 结果:Out[36]: Column<'first(current_date())'>
fn.current_date()[0]
# 结果:Out[37]: Column<'current_date()[0]'>
display(fn.current_date())
# 结果:Column<'current_date()'>
这是否完全不可能?
英文:
This is very simple in python, but I am currently learning PySpark in Databricks.
I just want to see what is returned by current_date()
in PySpark.
What I have tried:
from pyspark.sql import functions as fn
print(fn.current_date())
# Result: Column<'current_date()'>
fn.current_date()
# Result: Out[35]: Column<'current_date()'>
fn.first(fn.current_date())
# Result: Out[36]: Column<'first(current_date())'>
fn.current_date()[0]
# Result: Out[37]: Column<'current_date()[0]'>
display(fn.current_date())
# Result: Column<'current_date()'>
Is it just not possible?
答案1
得分: 1
你可以使用 spark.sql()
来处理这个情况。
示例:
print(spark.sql("select string(current_date())").collect()[0][0])
#2023-08-04
英文:
You can use spark.sql()
for this case.
Example:
print(spark.sql("select string(current_date())").collect()[0][0])
#2023-08-04
答案2
得分: 0
在Spark中,列表达式(例如current_date()
)在将它们放入数据框作为列并要求显示数据框之前不会显示结果。
考虑以下示例:
spark.range(1)
- 创建一个数据框
.select(F.current_date())
- 选择使用函数current_date
创建的列
.show()
- 打印数据框
from pyspark.sql import functions as F
spark.range(1).select(F.current_date()).show()
# +--------------+
# |current_date()|
# +--------------+
# | 2023-08-04|
# +--------------+
spark.sql("select current_date()")
- 使用SQL表达式创建数据框和列
.show()
- 打印数据框
spark.sql("select current_date()").show()
# +--------------+
# |current_date()|
# +--------------+
# | 2023-08-04|
# +--------------+
.head()
- 访问数据框的第一行(作为pyspark.sql.types.Row
对象)
[0]
- 访问行的第一个元素("列")
spark.sql("select current_date()").head()[0]
# datetime.date(2023, 8, 4)
在Databricks中,display(df)
也应该有效,但您必须创建df
,例如:
display(spark.sql("select current_date()"))
英文:
In Spark, column expressions (e.g. current_date()
) do not show results until they are put into dataframes as columns and then the dataframe is instructed to be shown.
Consider the following examples:
spark.range(1)
- creating a dataframe
.select(F.current_date())
- selecting a column created using function current_date
.show()
- printing the dataframe
from pyspark.sql import functions as F
spark.range(1).select(F.current_date()).show()
# +--------------+
# |current_date()|
# +--------------+
# | 2023-08-04|
# +--------------+
spark.sql("select current_date()")
- creating both dataframe and column using SQL expression
.show()
- printing the dataframe
spark.sql("select current_date()").show()
# +--------------+
# |current_date()|
# +--------------+
# | 2023-08-04|
# +--------------+
.head()
- accessing the dataframe's first row (as a pyspark.sql.types.Row
object)
[0]
- accessing the first element ("column") of the row
spark.sql("select current_date()").head()[0]
# datetime.date(2023, 8, 4)
In Databricks, display(df)
should also work, but for this you must create the df
, e.g.:
display(spark.sql("select current_date()"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论