2023年4月19日 14:55:13go评论105阅读模式

英文:

How to rename the columns inside nested column in pyspark

问题

I want to remove the {} from the column color. <br> I don't want to flatten the column and rename it. I directly want to rename the column or drop the column.

英文:

I have a column product inside which there is a nested column called Color. I want to remove the {} from the column color. <br>

I don't want to flatten the column and rename it. I directly want to rename the column or drop the column.

|-- product: struct (nullable = true) 
| |-- {Color}: string (nullable = true)

I have tried dropping it but it doesn't work. I don't want to create a new struct as I have many more nested columns and they are too much.

|-- product: struct (nullable = true)
| |-- {Color}: string (nullable = true)#

答案1

得分: 2

尝试使用**.withField**来更新字段名称而不进行扁平化。

然后使用**.dropFields**来从结构中删除嵌套列。

示例:

#示例json
json = '{"product":{"{Color}":"a"}}'
df = spark.read.json( sc.parallelize([json]))
#使用`.withField`创建Color列并复制`{Color}`的数据
#使用.dropFields删除结构列
df1= df.withColumn("product", df['product'].withField('Color',col('product.`{Color}`'))).\
withColumn("product", col("product").dropFields("`{Color}`"))
df1.printSchema()
df1.show(10,False)
#根
# |-- product: struct (nullable = true)
# |    |-- Color: string (nullable = true)
#
#+-------+
#|product|
#+-------+
#|{a}    |
#+-------+

英文:

Try with .withField to update the field name without flattening.

Then use .dropFields to drop nested columns from struct.

Example:

#sample json
json = &#39;{&quot;product&quot;:{&quot;{Color}&quot;:&quot;a&quot;}}&#39;
df = spark.read.json( sc.parallelize([json]))
#create Color column by using `.withField` and copy the `{Color}`data
#use .dropFields to drop struct columns
df1= df.withColumn(&quot;product&quot;, df[&#39;product&#39;].withField(&#39;Color&#39;,col(&#39;product.`{Color}`&#39;))).\
withColumn(&quot;product&quot;, col(&quot;product&quot;).dropFields(&quot;`{Color}`&quot;))
df1.printSchema()
df1.show(10,False)
#root
# |-- product: struct (nullable = true)
# |    |-- Color: string (nullable = true)
#
#+-------+
#|product|
#+-------+
#|{a}      |
#+-------+

答案2

得分: 0

您可以使用 withColumn 来重命名嵌套列。以下是您可以使用的代码。我有一个与您相同架构的数据帧：

df.printSchema()

如何在pyspark中重命名嵌套列内的列

现在，如下图所示使用 withColumn，您可以更改嵌套列 color 的名称：

from pyspark.sql.functions import col, struct
df.withColumn("product", struct(col("product.{Color}").alias("Color"))).printSchema()

如何在pyspark中重命名嵌套列内的列

英文:

You can use withColumn to rename the nested column. The following is a code that you can use. I have a dataframe with following schema (same as yours):

df.printSchema()

如何在pyspark中重命名嵌套列内的列

Now using withColumn as shown in the below image, you can change the name of your nested column color:

from pyspark.sql.functions import col, struct
df.withColumn(&quot;product&quot;, struct(col(&quot;product.{Color}&quot;).alias(&quot;Color&quot;))).printSchema()

如何在pyspark中重命名嵌套列内的列

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pyspark中重命名嵌套列内的列

问题

答案1

答案2

对一系列带有约束条件的项目进行聚类。

为什么在Python中使用\会改变字符串的内容？

将Matlab代码转换为Python以读取二进制文件。

Kubernetes中的执行器Pod在提交Spark作业到K8s时不断创建然后移除。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论