英文:
Changing order in array struct column and adding new elements - SPARK SCALA
问题
我有这个schema1:
我试图对结构中的元素进行排序,并在schema2中的任何位置添加一些新元素,如NULL:
我尝试过这样做:
df.withColumn("sample",
expr("transform(sample, x -> struct(x.elem1, x.elem2, 'NULL' as elem2, x.elem3, x.elem4, x.elem5, x.elem6, x.elem7))"))
我得到了这个错误:
仅允许折叠的字符串表达式出现在奇数位置,得到:NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder;第1行位置45;
有任何帮助吗?谢谢
英文:
I have this schema1 :
Im trying to sort the elements on the struct and also add some new elements as NULL in any position as in the schema2:
df.withColumn("sample",
expr("transform(sample, x -> struct(x.elem1, x.elem2, 'NULL' as elem2, x.elem3,x.elem4, x.elem5, x.elem6 , x.elem7))"))
I've got this error :
> Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder; line 1 pos 45;
Any help? Thanks
答案1
得分: 1
sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
selectExpr("transform(sample, x -> named_struct('elem1', x.elem1, 'elem2', x.elem2, 'elem3', null, 'elem4', x.elem4)) sample_rewired").
show()
works producing:
+-----------------+
| sample_rewired|
+-----------------+
|[{1, 2, null, 4}]|
+-----------------+
using struct will introduce generated column names for the nulls so:
sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
selectExpr("transform(sample, x -> struct(x.elem1, x.elem2, null, x.elem4)) sample_rewired").
selectExpr("explode(sample_rewired) no_array").selectExpr("no_array.*").
show()
yields:
+-----+-----+----+-----+
|elem1|elem2|col3|elem4|
+-----+-----+----+-----+
| 1| 2|null| 4|
+-----+-----+----+-----+
as such you'll want to specify names via named_struct.
As to the error you've got, no idea, it looks odd. Using withColumn and expr also doesn't fail. What Spark version are you on?
英文:
sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
selectExpr("transform(sample, x -> named_struct('elem1', x.elem1, 'elem2', x.elem2, 'elem3', null, 'elem4', x.elem4)) sample_rewired").
show()
works producing:
+-----------------+
| sample_rewired|
+-----------------+
|[{1, 2, null, 4}]|
+-----------------+
using struct will introduce generated column names for the nulls so:
sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
selectExpr("transform(sample, x -> struct(x.elem1, x.elem2, null, x.elem4)) sample_rewired").
selectExpr("explode(sample_rewired) no_array").selectExpr("no_array.*").
show()
yields:
+-----+-----+----+-----+
|elem1|elem2|col3|elem4|
+-----+-----+----+-----+
| 1| 2|null| 4|
+-----+-----+----+-----+
as such you'll want to specify names via named_struct.
As to the error you've got, no idea, it looks odd. Using withColumn and expr also doesn't fail. What Spark version are you on?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论