2020年8月5日 05:38:48go评论112阅读模式

英文:

How to explode an array column in spark java with dataset

问题

我有一个在Spark Java中的数据集，如下所示：
**当前：**
    +--------------+--------------------+
    |          x   |               YS.   |
    +--------------+--------------------+
    |x1            |   [Y1,Y2]          |
    |x2            |   [Y3]             |
我想要将这个数据集展开并将数组转换为单独的条目，如下所示：
    期望的结果：
    +--------------+--------------------+
    |          x   |    YS.   
    +--------------+--------------------+
    |x1            |   Y1          
    |X1            |.  Y2
    |x2            |   Y3            
我从数据库中读取表并读取两列，但无法使用explode功能。
DS = reader.option("table", "dummy").load()
                .select(X,YS).explode(??)
我应该如何使用explode并获取所需的Java数据集？

英文:

I have a Dataset in spark java as:
Current:

+--------------+--------------------+
|          x   |               YS.   |
+--------------+--------------------+
|x1            |   [Y1,Y2]          |
|x2            |   [Y3]             |

I want to explode this Dataset and convert the array in to individual entry as"

Desired:
+--------------+--------------------+
|          x   |    YS.   
+--------------+--------------------+
|x1            |   Y1          
|X1            |.  Y2
|x2            |   Y3

I read the table from database and read the two column but unable to use the explode functionality.

DS = reader.option(&quot;table&quot;, &quot;dummy&quot;).load()
                .select(X,YS).explode(??)

How should I use the explode and get the desired Dataset with Java.

答案1

得分: 1

在原则上，您需要选择一个新的列（而不是YS列），新列的值将是YS列的展开值。

从问题中的代码开始，可以这样做：

ds = reader.option("table", "dummy").load()
ds = ds.select(ds.col("X"), explode(ds.col("YS")).alias("Y"))

这里是API文档链接：https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/functions.html#explode-org.apache.spark.sql.Column-

英文:

In the principle, you need to select a new column (not the YS column), where the value of the new column will be an exploded YS column value.

Starting from the code from the question, this would be something like:

ds = reader.option(&quot;table&quot;, &quot;dummy&quot;).load()
ds = ds.select(ds.col(&quot;X&quot;), explode(ds.col(&quot;YS&quot;)).as(&quot;Y&quot;))

Here is the API doc: https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/functions.html#explode-org.apache.spark.sql.Column-

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Spark Java中使用数据集拆分数组列

问题

答案1

将JsonNode转换为Java POJO。

Is there a clean way of defining a Java interface method that has different parameters for each implementation class?

当我需要使用浮点数（floats/doubles）时，如何在Spring中处理验证表单输入？

How to go through a string and if it is a vowel add it with a for (String index out of range)

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。