英文:
Reading pyspark
问题
在databricks笔记本中,我正在创建一个包含年份和月份的源文件夹。
from datetime import datetime
now = datetime.now() # 当前日期和时间
year = now.strftime("%Y")
month = now.strftime("%m")
df = '"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/' + year + '/' + month + '/"'
print(df)
我在打印输出中得到了我想要的结果:
"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/"
然而,当我尝试通过pyspark从目标读取一个数据框时,我收到了一个错误消息,我不确定是什么原因引起的。感谢您的帮助。谢谢
DF = (
spark
.read
.option("header", "true")
.parquet(df)
)
错误消息
IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: "abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/%22
英文:
In databricks notebook, I am creating a source folder with year & month concatenated.
from datetime import datetime
now = datetime.now() # current date and time
year = now.strftime("%Y")
month = now.strftime("%m")
df = '"' + 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/' + year + '/' + month + '/"'
print(df)
I get the result I am looking for in the print
"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/"
However when I try to read a dataframe from the destination through pyspark, I get an error message which I am not sure what is causing it. Appreciate you help in this. Thanks
DF = (
spark
.read
.option("header", "true")
.parquet(df)
)
Error Message
IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: "abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/%22
答案1
得分: 1
Flock,
拼接字符串时无需添加引号。如果您移除df开头和结尾的'"',您的代码将正常工作。
我建议您使用f-strings进行拼接。它更易读和使用。
from datetime import datetime
now = datetime.now() # 当前日期和时间
year = now.strftime("%Y")
month = now.strftime("%m")
basepath = 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2'
df = f'{basepath}/{year}/{month}/'
print(df)
希望这对您有帮助。
英文:
Flock,
It's unnecessary to add quotation marks when concatenating strings. If you remove the '"' at the beggining and at the ending of df, your code will work.
I suggest you use f-strings to concatenate. It is more readable and easy to use.
from datetime import datetime
now = datetime.now() # current date and time
year = now.strftime("%Y")
month = now.strftime("%m")
basepath = 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2'
df = f'{basepath}/{year}/{month}/'
print(df)
答案2
得分: 0
你创建的路径是无效的。您收到的错误提示称开头的引号不正确。移除开头和结尾的引号以修复它。
英文:
The path you're creating is invalid. The error you're getting says that the quotation mark at the beginning is wrong. Remove the quotation marks at the beginning and end to fix it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论