阅读 PySpark

huangapple go评论49阅读模式
英文:

Reading pyspark

问题

在databricks笔记本中,我正在创建一个包含年份和月份的源文件夹。

    from datetime import datetime
    now = datetime.now() # 当前日期和时间
    
    year = now.strftime("%Y")
    month = now.strftime("%m")
    
    df = '"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/' + year + '/' + month + '/"'
    print(df)

我在打印输出中得到了我想要的结果:
"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/"

然而,当我尝试通过pyspark从目标读取一个数据框时,我收到了一个错误消息,我不确定是什么原因引起的。感谢您的帮助。谢谢

DF = (
    spark
    .read
    .option("header", "true")
    .parquet(df)
    )

错误消息

IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: "abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/%22
英文:

In databricks notebook, I am creating a source folder with year & month concatenated.

from datetime import datetime
now = datetime.now() # current date and time

year = now.strftime("%Y")
month = now.strftime("%m")

df = '"' + 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/' + year + '/' + month + '/"'
print(df)

I get the result I am looking for in the print
"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/"

However when I try to read a dataframe from the destination through pyspark, I get an error message which I am not sure what is causing it. Appreciate you help in this. Thanks

DF = (
    spark
    .read
    .option("header", "true")
    .parquet(df)
    )

Error Message

IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: "abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/%22

答案1

得分: 1

Flock,

拼接字符串时无需添加引号。如果您移除df开头和结尾的'"',您的代码将正常工作。

我建议您使用f-strings进行拼接。它更易读和使用。

from datetime import datetime
now = datetime.now() # 当前日期和时间

year = now.strftime("%Y")
month = now.strftime("%m")
basepath = 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2'

df = f'{basepath}/{year}/{month}/'
print(df)

希望这对您有帮助。

英文:

Flock,

It's unnecessary to add quotation marks when concatenating strings. If you remove the '"' at the beggining and at the ending of df, your code will work.

I suggest you use f-strings to concatenate. It is more readable and easy to use.

from datetime import datetime
now = datetime.now() # current date and time

year = now.strftime("%Y")
month = now.strftime("%m")
basepath = 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2'

df = f'{basepath}/{year}/{month}/'
print(df)

答案2

得分: 0

你创建的路径是无效的。您收到的错误提示称开头的引号不正确。移除开头和结尾的引号以修复它。

英文:

The path you're creating is invalid. The error you're getting says that the quotation mark at the beginning is wrong. Remove the quotation marks at the beginning and end to fix it.

huangapple
  • 本文由 发表于 2023年7月14日 02:07:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76682153.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定