问题

I have an RDD列表，其中包含元组和值，看起来像这样。有成千上万种不同的配对。

(A, B), 1
(B, C), 2
(C, D), 1
(A, D), 1
(D, A), 5

我想将元组值对转换为与这些配对对应的矩阵。我在Spark中没有看到任何简单的方法来实现这个。

+---+------+------+------+------+
|   |  A   |  B   |  C   |  D   |
+---+------+------+------+------+
| A |  -   |  1   | NULL |  1   |
| B | NULL |  -   |  2   | NULL |
| C | NULL | NULL |  -   |  1   |
| D |  5   | NULL | NULL |  -   |
+---+------+------+------+------+

英文:

I have an rdd list of tuples and values that looks like this. There are thousands of different pairings.

(A, B), 1
(B, C), 2
(C, D), 1
(A, D), 1
(D, A), 5

I want to transform the tuple value pairs into a matrix that corresponds to the pairs. I didn't see any easy way to do this in spark.

+---+------+------+------+------+
|   |  A   |  B   |  C   |  D   |
+---+------+------+------+------+
| A | -    | 1    | NULL | 1    |
| B | NULL | -    | 2    | NULL |
| C | NULL |      | -    | 1    |
| D | 5    | NULL | NULL | -    |
+---+------+------+------+------+

答案1

得分: 1

以下是翻译好的内容：

最大努力，但无法使用spark-sql（如您所述）去除列名。
只是按自然顺序旋转。
试试，添加了额外的元组。

import org.apache.spark.sql.functions._
// 注意不确定(&quot;A&quot;, &quot;B&quot;)、1或&quot;A&quot;、&quot;B&quot;、1之间的区别
val rdd = sc.parallelize(Seq(((&quot;A&quot;, &quot;B&quot;), 1), ((&quot;B&quot;, &quot;C&quot;), 2), ((&quot;C&quot;, &quot;D&quot;), 1), ((&quot;A&quot;, &quot;D&quot;), 1), ((&quot;D&quot;, &quot;A&quot;), 5), ((&quot;E&quot;, &quot;Z&quot;), 500)))

// 事实上可以从这里开始
val rdd2 = rdd.map(x =&gt; (x._1._1, x._1._2, x._2))

val df = rdd2.toDF()

// 自然排序，但无法去除DF（spark sql）中的 _1 列
df.groupBy(&quot;_1&quot;).pivot(&quot;_2&quot;).agg(first(&quot;_3&quot;))
  .orderBy(&quot;_1&quot;)
  .show(false)

+---+----+----+----+----+----+
|_1 |A   |B   |C   |D   |Z   |
+---+----+----+----+----+----+
|A  |null|1   |null|1   |null|
|B  |null|null|2   |null|null|
|C  |null|null|null|1   |null|
|D  |5   |null|null|null|null|
|E  |null|null|null|null|500 |
+---+----+----+----+----+----+

英文:

Best effort, but cannot get rid of a column name using spark-sql (which you state).
Just pivoting with natural order.
Try it, added extra tuple.

import org.apache.spark.sql.functions._ 
// Note sure what difference is between (&quot;A&quot;, &quot;B&quot;), 1 or &quot;A&quot;, &quot;B&quot;, 1
val rdd = sc.parallelize(Seq(  ((&quot;A&quot;, &quot;B&quot;), 1), ((&quot;B&quot;, &quot;C&quot;), 2), ((&quot;C&quot;, &quot;D&quot;), 1), ((&quot;A&quot;, &quot;D&quot;), 1), ((&quot;D&quot;, &quot;A&quot;), 5), ((&quot;E&quot;, &quot;Z&quot;), 500) ))

// Can start from here in fact
val rdd2 = rdd.map(x =&gt; (x._1._1, x._1._2, x._2))

val df = rdd2.toDF()

// Natural ordering, but cannot get rid of _1 column in a DF (spark sql)
df.groupBy(&quot;_1&quot;).pivot(&quot;_2&quot;).agg(first(&quot;_3&quot;))
  .orderBy(&quot;_1&quot;)
  .show(false)

returns:

+---+----+----+----+----+----+
|_1 |A   |B   |C   |D   |Z   |
+---+----+----+----+----+----+
|A  |null|1   |null|1   |null|
|B  |null|null|2   |null|null|
|C  |null|null|null|1   |null|
|D  |5   |null|null|null|null|
|E  |null|null|null|null|500 |
+---+----+----+----+----+----+

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将元组转换为Spark中的矩阵。

问题

答案1

Spring Batch – ItemWriter is writing same object read by ItemReader but not the one returned after processing through ItemProcessor

JavaScript JSON解析查询与MongoDB

Java应用程序在使用Thread.Sleep()后未响应。

将数据从ItemListener事件存储到ArrayList中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论