如何在Spark Java中使用数据集拆分数组列

huangapple go评论71阅读模式
英文:

How to explode an array column in spark java with dataset

问题

我有一个在Spark Java中的数据集如下所示
**当前**

    +--------------+--------------------+
    |          x   |               YS.   |
    +--------------+--------------------+
    |x1            |   [Y1,Y2]          |
    |x2            |   [Y3]             |

我想要将这个数据集展开并将数组转换为单独的条目如下所示

    期望的结果

    +--------------+--------------------+
    |          x   |    YS.   
    +--------------+--------------------+
    |x1            |   Y1          
    |X1            |.  Y2
    |x2            |   Y3            

我从数据库中读取表并读取两列但无法使用explode功能

DS = reader.option("table", "dummy").load()
                .select(X,YS).explode(??)

我应该如何使用explode并获取所需的Java数据集
英文:

I have a Dataset in spark java as:
Current:

+--------------+--------------------+
|          x   |               YS.   |
+--------------+--------------------+
|x1            |   [Y1,Y2]          |
|x2            |   [Y3]             |

I want to explode this Dataset and convert the array in to individual entry as"

Desired:

+--------------+--------------------+
|          x   |    YS.   
+--------------+--------------------+
|x1            |   Y1          
|X1            |.  Y2
|x2            |   Y3            

I read the table from database and read the two column but unable to use the explode functionality.

DS = reader.option("table", "dummy").load()
                .select(X,YS).explode(??)

How should I use the explode and get the desired Dataset with Java.

答案1

得分: 1

在原则上,您需要选择一个新的列(而不是YS列),新列的将是YS列的展开值。

从问题中的代码开始,可以这样做:

ds = reader.option("table", "dummy").load()
ds = ds.select(ds.col("X"), explode(ds.col("YS")).alias("Y"))

这里是API文档链接:https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/functions.html#explode-org.apache.spark.sql.Column-

英文:

In the principle, you need to select a new column (not the YS column), where the value of the new column will be an exploded YS column value.

Starting from the code from the question, this would be something like:

ds = reader.option("table", "dummy").load()
ds = ds.select(ds.col("X"), explode(ds.col("YS")).as("Y"))

Here is the API doc: https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/functions.html#explode-org.apache.spark.sql.Column-

huangapple
  • 本文由 发表于 2020年8月5日 05:38:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/63255431.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定