2023年7月18日 03:22:41go评论133阅读模式

英文:

why my file is generated in ADLS like this 'part-00000-6feffff6-eef6-41ec-9d' even i mention the filename on adf

问题

我在adf上使用了数据流来筛选数据集的列，并将输出存储在一个名为filename.csv的文件中，如您在图片中所见。

但在adls中生成了一个新文件，名称为：part-00000-6feffff6-eef6-41ec-9da8-c10e671923df-。

英文:

I used a Data Flow on adf for filter the column of dataset and stored the output on a filename.csv file as you can see on pic
enter image description here

but in adls a new file generated with this name: part-00000-6feffff6-eef6-41ec-9da8-c10e671923df-

enter image description here

答案1

得分: 1

我同意@Gal Weiss的观点，数据流在写入文件时遵循了Spark的方式。

此外，如果您想将其写入单个文件，请转至Sink设置-> 文件名选项 -> 输出到单个文件，然后在其中输入您的文件名。

但是为此，我们需要将分区设置为单个分区，这会减慢数据流的执行时间。

why my file is generated in ADLS like this 'part-00000-6feffff6-eef6-41ec-9d' even i mention the filename on adf 。
这将把它写入与下面类似的同一目标文件夹中。

why my file is generated in ADLS like this 'part-00000-6feffff6-eef6-41ec-9d' even i mention the filename on adf

英文:

I agree with @Gal Weiss that dataflow follows the spark way when writing to the files.

Adding to that answer, if you want to get it write to a single file, go to Sink settings-> File name option -> Output to single file and give your file name in it.

But for this, we need to set the partitioning to Single partition which slowdowns the dataflow execution time.

why my file is generated in ADLS like this 'part-00000-6feffff6-eef6-41ec-9d' even i mention the filename on adf .
This will write it to a single file in the same target folder like below.

why my file is generated in ADLS like this 'part-00000-6feffff6-eef6-41ec-9d' even i mention the filename on adf

答案2

得分: 0

这是一个分区文件，
我不熟悉你的数据流程是如何工作的。
但如果这是使用Spark或Hadoop的东西，这是保存文件的方式。
它将它们分割成多个分区以便并行处理。
在这种情况下，您分配的名称将像目录名称一样运作，实际数据位于“part”文件中。
这没问题，如果您尝试使用相同的文件名读取数据集，Spark/Hadoop文件系统会知道在其中查找“part”文件。

最后，你只有一个文件的原因是因为数据在写入之前被分区，或者你的数据集很小，而默认配置是只使用一个分区。

英文:

This is a partition file,
I'm not familiar with the way your data flow works.
But if this is something that using spark or Hadoop, this is the way it saves files
It splites them to multiple partitions for simplicity of parallel processing.
In this case the name you assign will function like a directory name and the actual data is in the "part" files.
That is ok, if you will try to read the dataset using the same file name, the spark/Hadoop filesystem knows to look for the "part" files under it.

Lastly, the reason you have only one file is because of the way the data was partitioned before being written, or you have very small dataset and the default configuration is to use only one partition.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

why my file is generated in ADLS like this 'part-00000-6feffff6-eef6-41ec-9d' even i mention the filename on adf

问题

答案1

答案2

如何在Azure复制活动中使用API偏移分页以在处理完最后一页时动态停止。

如何在Azure数据工厂中使用文件创建日期[YYYYMMDDHHMMSS]参数化年月日[YYYYMMDD]。

使用ADF管道创建Azure批处理服务池

Python smart_open在文档中的代码中引发了NotImplementedError错误。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论