英文:
How to trim pyspark schema output
问题
Here is the translated portion:
我的pyspark DataFrame 具有以下模式...
DataFrame[ExternalData: struct<provider:string,data:string,modality:array<string>>]
如果我写(其中sdf
是我的pyspark DataFrame)...
sdf.schema
我会得到...
StructType([StructField('ExternalData', StructType([StructField('provider', StringType(), True), StructField('data', StringType(), True), StructField('modality', ArrayType(StringType(), True), True)]), True)])
我如何只获取以下内容?
StructType([StructField('provider', StringType(), True), StructField('data', StringType(), True), StructField('modality', ArrayType(StringType(), True), True)])
有一个细微的差异,即已删除了ExternalData
StructType
和StructField
。我需要这样做的原因是因为我正在集成parquet与系统,该系统期望以这种格式传递parquet模式,其中ExternalData
字段和结构在其他地方传递。
有人有什么建议吗?
英文:
My pyspark dataframe has the following schema...
DataFrame[ExternalData: struct<provider:string,data:string,modality:array<string>>]
If I write (where sdf
is my pyspark dataframe)..
sdf.schema
I get...
StructType([StructField('ExternalData', StructType([StructField('provider', StringType(), True), StructField('data', StringType(), True), StructField('modality', ArrayType(StringType(), True), True)]), True)])
How can I get just the below?
StructType([StructField('provider', StringType(), True), StructField('data', StringType(), True), StructField('modality', ArrayType(StringType(), True), True)])
There is a subtle difference in that the ExternalData
StructType
and StructField
has been removed. The reason I need to do this is because the system I'm integrating parquet with expects parquet schema in this format, where ExternalData
field and struct is passed elsewhere.
Does anyone have any advice?
答案1
得分: 1
以下是翻译好的内容:
尝试这样做:
您的DataFrame模式:
root
|-- ExternalData: struct (nullable = true)
| |-- provider: string (nullable = true)
| |-- data: string (nullable = true)
| |-- modality: array (nullable = true)
| | |-- element: string (containsNull = true)
选择所有ExternalData
的子列以获得所需的输出
sdf = sdf.select("ExternalData.*")
sdf.printSchema()
输出:
root
|-- provider: string (nullable = true)
|-- data: string (nullable = true)
|-- modality: array (nullable = true)
| |-- element: string (containsNull = true)
英文:
Try this:
Your DataFrame schema:
root
|-- ExternalData: struct (nullable = true)
| |-- provider: string (nullable = true)
| |-- data: string (nullable = true)
| |-- modality: array (nullable = true)
| | |-- element: string (containsNull = true)
Selecting all the sub-columns of ExternalData
to get the desired output
sdf = sdf.select("ExternalData.*")
sdf.printSchema()
Output:
root
|-- provider: string (nullable = true)
|-- data: string (nullable = true)
|-- modality: array (nullable = true)
| |-- element: string (containsNull = true)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论