Spark如何将两个数组列合并而不去除重复项

huangapple go评论65阅读模式
英文:

Spark how to union two arrays column without removing duplicates

问题

有一个名为array_union的函数,它可以合并两个数组并去除重复元素。如何合并两个数组但不去除重复元素?

+---------+---------+
|field    |field1   |
+---------+---------+
|[1, 2, 2]|[1, 2, 2]|
+---------+---------+
.withColumn("union", array_union(col("field"), col("field1")))

结果:

+---------+---------+------------------+
|field    |field1   |union             |
+---------+---------+------------------+
|[1, 2, 2]|[1, 2, 2]|[1, 2, 2, 1, 2, 2]|
+---------+---------+------------------+
英文:

There is function array_union, that union two arrays without duplicates. How can I union two arrays without removing duplicates?

+---------+---------+
|field    |field1   |
+---------+---------+
|[1, 2, 2]|[1, 2, 2]|
+---------+---------+
.withColumn("union", array_union(col("field"), col("field1")))

Result:

+---------+---------+------------------+
|field    |field1   |union             |
+---------+---------+------------------+
|[1, 2, 2]|[1, 2, 2]|[1, 2, 2, 1, 2, 2]|
+---------+---------+------------------+

答案1

得分: 2

只需使用 concat

import org.apache.spark.sql.functions.{concat}

df1.withColumn("NewArr", concat("Array1", "Array2")).show()
英文:

Just use concat instead,

import org.apache.spark.sql.functions.{concat}

df1.withColumn("NewArr", concat("Array1","Array2")).show()

Input:

Spark如何将两个数组列合并而不去除重复项

Output:

Spark如何将两个数组列合并而不去除重复项

huangapple
  • 本文由 发表于 2023年2月6日 20:23:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75361275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定