Changing order in array struct column and adding new elements – SPARK SCALA

huangapple go评论105阅读模式
英文:

Changing order in array struct column and adding new elements - SPARK SCALA

问题

我有这个schema1:

Changing order in array struct column and adding new elements – SPARK SCALA

我试图对结构中的元素进行排序,并在schema2中的任何位置添加一些新元素,如NULL:

Changing order in array struct column and adding new elements – SPARK SCALA

我尝试过这样做:

df.withColumn("sample",
    expr("transform(sample, x -> struct(x.elem1, x.elem2, 'NULL' as elem2, x.elem3, x.elem4, x.elem5, x.elem6, x.elem7))"))

我得到了这个错误:

仅允许折叠的字符串表达式出现在奇数位置,得到:NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder;第1行位置45;

有任何帮助吗?谢谢

英文:

I have this schema1 :

Changing order in array struct column and adding new elements – SPARK SCALA

Im trying to sort the elements on the struct and also add some new elements as NULL in any position as in the schema2:

Changing order in array struct column and adding new elements – SPARK SCALA
I tried this :

df.withColumn("sample",
    expr("transform(sample, x -> struct(x.elem1, x.elem2, 'NULL' as elem2, x.elem3,x.elem4, x.elem5, x.elem6 , x.elem7))"))

I've got this error :

> Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder; line 1 pos 45;

Any help? Thanks

答案1

得分: 1

sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
      selectExpr("transform(sample, x -> named_struct('elem1', x.elem1, 'elem2', x.elem2, 'elem3', null, 'elem4', x.elem4)) sample_rewired").
      show()

works producing:

+-----------------+
|   sample_rewired|
+-----------------+
|[{1, 2, null, 4}]|
+-----------------+

using struct will introduce generated column names for the nulls so:

sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
      selectExpr("transform(sample, x -> struct(x.elem1, x.elem2, null, x.elem4)) sample_rewired").
      selectExpr("explode(sample_rewired) no_array").selectExpr("no_array.*").
      show()

yields:

+-----+-----+----+-----+
|elem1|elem2|col3|elem4|
+-----+-----+----+-----+
|    1|    2|null|    4|
+-----+-----+----+-----+

as such you'll want to specify names via named_struct.

As to the error you've got, no idea, it looks odd. Using withColumn and expr also doesn't fail. What Spark version are you on?

英文:
sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
      selectExpr("transform(sample, x -> named_struct('elem1', x.elem1, 'elem2', x.elem2, 'elem3', null, 'elem4', x.elem4)) sample_rewired").
      show()

works producing:

+-----------------+
|   sample_rewired|
+-----------------+
|[{1, 2, null, 4}]|
+-----------------+

using struct will introduce generated column names for the nulls so:

sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
      selectExpr("transform(sample, x -> struct(x.elem1, x.elem2, null, x.elem4)) sample_rewired").
      selectExpr("explode(sample_rewired) no_array").selectExpr("no_array.*").
      show()

yields:

+-----+-----+----+-----+
|elem1|elem2|col3|elem4|
+-----+-----+----+-----+
|    1|    2|null|    4|
+-----+-----+----+-----+

as such you'll want to specify names via named_struct.

As to the error you've got, no idea, it looks odd. Using withColumn and expr also doesn't fail. What Spark version are you on?

huangapple
  • 本文由 发表于 2023年8月10日 19:17:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76875230.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定