英文:
Spark how to union two arrays column without removing duplicates
问题
有一个名为array_union
的函数,它可以合并两个数组并去除重复元素。如何合并两个数组但不去除重复元素?
+---------+---------+
|field |field1 |
+---------+---------+
|[1, 2, 2]|[1, 2, 2]|
+---------+---------+
.withColumn("union", array_union(col("field"), col("field1")))
结果:
+---------+---------+------------------+
|field |field1 |union |
+---------+---------+------------------+
|[1, 2, 2]|[1, 2, 2]|[1, 2, 2, 1, 2, 2]|
+---------+---------+------------------+
英文:
There is function array_union, that union two arrays without duplicates. How can I union two arrays without removing duplicates?
+---------+---------+
|field |field1 |
+---------+---------+
|[1, 2, 2]|[1, 2, 2]|
+---------+---------+
.withColumn("union", array_union(col("field"), col("field1")))
Result:
+---------+---------+------------------+
|field |field1 |union |
+---------+---------+------------------+
|[1, 2, 2]|[1, 2, 2]|[1, 2, 2, 1, 2, 2]|
+---------+---------+------------------+
答案1
得分: 2
只需使用 concat
,
import org.apache.spark.sql.functions.{concat}
df1.withColumn("NewArr", concat("Array1", "Array2")).show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论