英文:
Pyspark flatten embedded structs all into same level
问题
有没有一种简单的方法来创建像上图中那样的效果,其中所有列都以未嵌套的方式并排在同一级别?
英文:
Is there an easy way to make something like the top to the bottom picture where all columns are next to each other in an unnested way, same level?
答案1
得分: 1
你可以通过从内部结构添加字段来更新source
结构体。类似于以下方式:
# 获取除内部结构e_struct之外的所有source结构体字段
source_cols = [col(f"source.{c}") for c in df.select(col("source.*")).columns if c != "e_struct"]
# 获取内部结构e_struct的所有字段
e_struct_cols = [col(f"source.e_struct.{c}") for c in df.select(col("source.e_struct.*")).columns]
# 合并它们
new_struct_cols = source_cols + e_struct_cols
# 更新source列
df = df.withColumn("source", struct(*new_struct_cols))
英文:
You can update the source
struct by adding fields from the inner struct. Something like this:
# get all fields of source struct except the inner struct e_struct
source_cols = [col(f"source.{c}") for c in df.select(col("source.*")).columns if c != "e_struct"]
# get all fields of the inner struct e_struct
e_struct_cols = [col(f"source.e_struct.{c}") for c in df.select(col("source.e_struct.*")).columns]
# combine them
new_struct_cols = source_cols + e_struct_cols
# update source column
df = df.withColumn("source", struct(*new_struct_cols))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论