如何在保留嵌套结构的情况下选择Spark DataFrame中的嵌套结构

huangapple go评论61阅读模式
英文:

How to select a nested structure while preserving the nested structure in spark dataframe

问题

我只对 my_value2my_id2 以及 my_id1 感兴趣,但我想保留嵌套结构。

如果我这样做:

val myDf = myoriginalDF
  .select("key.*", "value.my_value2")

它会返回:

my_id1
my_id2
my_value2

在选择时如何保留嵌套结构?

英文:

My data look like

key
    my_id1
    my_id2
value
    my_value1
    my_value2

I am only interested in my_value2 and my_id2 my_id1 but i want to preserve the nested structure

if i do

val myDf = myoriginalDF
  .select("key.*", "value.my_value2")

it returns

my_id1
my_id2
my_value2

how do i preserve the nested structure during select?

答案1

得分: 1

你可以使用以下方式在 struct 中使用 select

df.select(
    struct($"key.my_id1", $"key.my_id2").alias("key"),
    struct($"value.my_value2").alias("value")
)

你的输出架构将会是:

root
 |-- key: struct 
 |    |-- my_id1
 |    |-- my_id2
 |-- value: struct
 |    |-- my_value2

这应该是你想要的,祝你好运!

英文:

You could use select with struct as below:

df.select(
    struct($"key.my_id1", $"key.my_id2").alias("key"),
    struct($"value.my_value2").alias("value")
  )

The schema of your output will be:

root
 |-- key: struct 
 |    |-- my_id1
 |    |-- my_id2
 |-- value: struct
 |    |-- my_value2

Which should be what you want, good luck!

huangapple
  • 本文由 发表于 2023年6月6日 02:36:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76409152.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定