英文:
Fetch first row as Headers from one csv and values from other csv
问题
我有两个CSV文件。第一个CSV文件只有一行,这一行是表头。第二个CSV文件包含数值数据。
我想要创建一个数据框,其中包含来自CSV1的第一行作为表头,以及来自CSV2的所有行的数值数据。这两个CSV文件的字段数相同,从_c0到_c1000(大约有1000列)。每个CSV文件中的列类型可以不同,但列名和列数都是相同的。以下是一个示例片段。我正在使用Databricks(pyspark)。感谢任何帮助。
英文:
I have two csv's. First csv has only 1 row which is headers. 2nd csv has values.
I want to create the dataframe which has headers from row1 from csv1 and values from all rows within csv 2. Both the csv's has same number of fields starting from _c0 till _c1000 (has about 1000 columns). Columns types can be different within each csv but column names and number of columns will be same. Below is the example snip. I am using databricks (pyspark). Any help is appreciated.
答案1
得分: 0
你可以在读取第一个文件后,将其结果中的模式应用于读取第二个文件:
df1 = spark.read.option('header', True).csv('<包含标题的文件路径>')
df2 = spark.read.schema(df1.schema).csv('<不包含标题的文件路径>')
英文:
You can impose the schema resulted from reading the first file on reading the second file:
df1 = spark.read.option('header', True).csv('<path to the file with header>')
df2 = spark.read.schema(df1.schema).csv('<path to the file without header>')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论