将嵌套的结构全部展平到同一级别

huangapple go评论79阅读模式
英文:

Pyspark flatten embedded structs all into same level

问题

有没有一种简单的方法来创建像上图中那样的效果,其中所有列都以未嵌套的方式并排在同一级别?

英文:

Is there an easy way to make something like the top to the bottom picture where all columns are next to each other in an unnested way, same level?
将嵌套的结构全部展平到同一级别

将嵌套的结构全部展平到同一级别

答案1

得分: 1

你可以通过从内部结构添加字段来更新source结构体。类似于以下方式:

# 获取除内部结构e_struct之外的所有source结构体字段
source_cols = [col(f"source.{c}") for c in df.select(col("source.*")).columns if c != "e_struct"]

# 获取内部结构e_struct的所有字段
e_struct_cols = [col(f"source.e_struct.{c}") for c in df.select(col("source.e_struct.*")).columns]

# 合并它们
new_struct_cols = source_cols + e_struct_cols

# 更新source列
df = df.withColumn("source", struct(*new_struct_cols))
英文:

You can update the source struct by adding fields from the inner struct. Something like this:

# get all fields of source struct except the inner struct e_struct
source_cols = [col(f"source.{c}") for c in df.select(col("source.*")).columns if c != "e_struct"]

# get all fields of the inner struct e_struct
e_struct_cols = [col(f"source.e_struct.{c}") for c in df.select(col("source.e_struct.*")).columns]

# combine them
new_struct_cols = source_cols + e_struct_cols

# update source column
df = df.withColumn("source", struct(*new_struct_cols))

huangapple
  • 本文由 发表于 2020年1月3日 19:58:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/59578236.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定