Spark如何将两个数组列合并而不去除重复项

huangapple

117266
文章

0
评论

2023年2月6日 20:23:33go评论88阅读模式

英文:

Spark how to union two arrays column without removing duplicates

问题

有一个名为array_union的函数，它可以合并两个数组并去除重复元素。如何合并两个数组但不去除重复元素？

+---------+---------+
|field    |field1   |
+---------+---------+
|[1, 2, 2]|[1, 2, 2]|
+---------+---------+

.withColumn("union", array_union(col("field"), col("field1")))

结果：

+---------+---------+------------------+
|field    |field1   |union             |
+---------+---------+------------------+
|[1, 2, 2]|[1, 2, 2]|[1, 2, 2, 1, 2, 2]|
+---------+---------+------------------+

英文:

There is function array_union, that union two arrays without duplicates. How can I union two arrays without removing duplicates?

+---------+---------+
|field    |field1   |
+---------+---------+
|[1, 2, 2]|[1, 2, 2]|
+---------+---------+

.withColumn(&quot;union&quot;, array_union(col(&quot;field&quot;), col(&quot;field1&quot;)))

Result:

+---------+---------+------------------+
|field    |field1   |union             |
+---------+---------+------------------+
|[1, 2, 2]|[1, 2, 2]|[1, 2, 2, 1, 2, 2]|
+---------+---------+------------------+

答案1

得分: 2

只需使用 concat，

import org.apache.spark.sql.functions.{concat}
df1.withColumn("NewArr", concat("Array1", "Array2")).show()

英文:

Just use concat instead,

import org.apache.spark.sql.functions.{concat}
df1.withColumn(&quot;NewArr&quot;, concat(&quot;Array1&quot;,&quot;Array2&quot;)).show()

Input:

Output:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

本文由 huangapple 发表于 2023年2月6日 20:23:33
转载请务必保留本文链接：https://go.coder-hub.com/75361275.html

apache-spark
arrays
scala

我想让Flink能够从Pulsar读取数据。

go 111 07/28

“`如何使用Jolt将数组插入特定数组元素？“`

go 114 08/12

理解MemoryMarshal.Cast的返回值

go 105 01/03

Why I am able to use Arrays.asList method with an object declared for interface, List, but not when it is declared as an instance of class ArrayList?

go 135 08/27

Spark如何将两个数组列合并而不去除重复项

问题

答案1

我想让Flink能够从Pulsar读取数据。

“`如何使用Jolt将数组插入特定数组元素？“`

理解MemoryMarshal.Cast的返回值

Why I am able to use Arrays.asList method with an object declared for interface, List, but not when it is declared as an instance of class ArrayList?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。