2023年7月14日 02:07:01go评论86阅读模式

英文:

Reading pyspark

问题

在databricks笔记本中，我正在创建一个包含年份和月份的源文件夹。
    from datetime import datetime
    now = datetime.now() # 当前日期和时间
    
    year = now.strftime("%Y")
    month = now.strftime("%m")
    
    df = '"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/' + year + '/' + month + '/"'
    print(df)
我在打印输出中得到了我想要的结果：
"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/"
然而，当我尝试通过pyspark从目标读取一个数据框时，我收到了一个错误消息，我不确定是什么原因引起的。感谢您的帮助。谢谢
DF = (
    spark
    .read
    .option("header", "true")
    .parquet(df)
    )
错误消息
IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: "abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/%22

英文:

In databricks notebook, I am creating a source folder with year & month concatenated.

from datetime import datetime
now = datetime.now() # current date and time
year = now.strftime(&quot;%Y&quot;)
month = now.strftime(&quot;%m&quot;)
df = &#39;&quot;&#39; + &#39;abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/&#39; + year + &#39;/&#39; + month + &#39;/&quot;&#39;
print(df)

I get the result I am looking for in the print
"abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/"

However when I try to read a dataframe from the destination through pyspark, I get an error message which I am not sure what is causing it. Appreciate you help in this. Thanks

DF = (
    spark
    .read
    .option(&quot;header&quot;, &quot;true&quot;)
    .parquet(df)
    )

Error Message

IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: "abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/%22

答案1

得分: 1

Flock,

拼接字符串时无需添加引号。如果您移除df开头和结尾的'"'，您的代码将正常工作。

我建议您使用f-strings进行拼接。它更易读和使用。

from datetime import datetime
now = datetime.now() # 当前日期和时间
year = now.strftime("%Y")
month = now.strftime("%m")
basepath = 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2'
df = f'{basepath}/{year}/{month}/'
print(df)

希望这对您有帮助。

英文:

Flock,

It's unnecessary to add quotation marks when concatenating strings. If you remove the '"' at the beggining and at the ending of df, your code will work.

I suggest you use f-strings to concatenate. It is more readable and easy to use.

from datetime import datetime
now = datetime.now() # current date and time
year = now.strftime(&quot;%Y&quot;)
month = now.strftime(&quot;%m&quot;)
basepath = &#39;abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2&#39;
df = f&#39;{basepath}/{year}/{month}/&#39;
print(df)

答案2

得分: 0

你创建的路径是无效的。您收到的错误提示称开头的引号不正确。移除开头和结尾的引号以修复它。

英文:

The path you're creating is invalid. The error you're getting says that the quotation mark at the beginning is wrong. Remove the quotation marks at the beginning and end to fix it.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

阅读 PySpark

问题

答案1

答案2

spark.sqlContext.implicits._ 在 Scala 中是如何工作的？

使用pyspark，我可以写入我没有GetObject权限的S3路径吗？

Airflow BashOperator在任务失败时仍返回退出代码0，应返回退出代码1。

ClassNotFoundException: org.apache.spark.sql.connector.read.SupportsRuntimeFiltering on Google Dataproc cluster using Airflow

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。