问题

我有两个CSV文件。第一个CSV文件只有一行，这一行是表头。第二个CSV文件包含数值数据。

我想要创建一个数据框，其中包含来自CSV1的第一行作为表头，以及来自CSV2的所有行的数值数据。这两个CSV文件的字段数相同，从_c0到_c1000（大约有1000列）。每个CSV文件中的列类型可以不同，但列名和列数都是相同的。以下是一个示例片段。我正在使用Databricks（pyspark）。感谢任何帮助。

英文:

I have two csv's. First csv has only 1 row which is headers. 2nd csv has values.
I want to create the dataframe which has headers from row1 from csv1 and values from all rows within csv 2. Both the csv's has same number of fields starting from _c0 till _c1000 (has about 1000 columns). Columns types can be different within each csv but column names and number of columns will be same. Below is the example snip. I am using databricks (pyspark). Any help is appreciated.

答案1

得分: 0

你可以在读取第一个文件后，将其结果中的模式应用于读取第二个文件：

df1 = spark.read.option('header', True).csv('<包含标题的文件路径>')
df2 = spark.read.schema(df1.schema).csv('<不包含标题的文件路径>')

英文:

You can impose the schema resulted from reading the first file on reading the second file:

df1 = spark.read.option(&#39;header&#39;, True).csv(&#39;&lt;path to the file with header&gt;&#39;)
df2 = spark.read.schema(df1.schema).csv(&#39;&lt;path to the file without header&gt;&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从一个CSV文件中获取第一行作为标题，从另一个CSV文件中获取数值。

问题

答案1

Scipy中的插值在Python中使用meshgrid。

在使用Google的TPU时，在Colab中导入Causal Impact时出现问题。

计算列表中第一个连续重复部分中有多少个 “1”。

嵌套一个#[pyclass]在另一个#[pyclass]中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论