英文:
How to explode an array column in spark java with dataset
问题
我有一个在Spark Java中的数据集,如下所示:
**当前:**
+--------------+--------------------+
| x | YS. |
+--------------+--------------------+
|x1 | [Y1,Y2] |
|x2 | [Y3] |
我想要将这个数据集展开并将数组转换为单独的条目,如下所示:
期望的结果:
+--------------+--------------------+
| x | YS.
+--------------+--------------------+
|x1 | Y1
|X1 |. Y2
|x2 | Y3
我从数据库中读取表并读取两列,但无法使用explode功能。
DS = reader.option("table", "dummy").load()
.select(X,YS).explode(??)
我应该如何使用explode并获取所需的Java数据集?
英文:
I have a Dataset in spark java as:
Current:
+--------------+--------------------+
| x | YS. |
+--------------+--------------------+
|x1 | [Y1,Y2] |
|x2 | [Y3] |
I want to explode this Dataset and convert the array in to individual entry as"
Desired:
+--------------+--------------------+
| x | YS.
+--------------+--------------------+
|x1 | Y1
|X1 |. Y2
|x2 | Y3
I read the table from database and read the two column but unable to use the explode functionality.
DS = reader.option("table", "dummy").load()
.select(X,YS).explode(??)
How should I use the explode and get the desired Dataset with Java.
答案1
得分: 1
在原则上,您需要选择一个新的列(而不是YS
列),新列的值将是YS
列的展开值。
从问题中的代码开始,可以这样做:
ds = reader.option("table", "dummy").load()
ds = ds.select(ds.col("X"), explode(ds.col("YS")).alias("Y"))
这里是API文档链接:https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/functions.html#explode-org.apache.spark.sql.Column-
英文:
In the principle, you need to select a new column (not the YS
column), where the value of the new column will be an exploded YS
column value.
Starting from the code from the question, this would be something like:
ds = reader.option("table", "dummy").load()
ds = ds.select(ds.col("X"), explode(ds.col("YS")).as("Y"))
Here is the API doc: https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/functions.html#explode-org.apache.spark.sql.Column-
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论