英文:
Explode a nested array into new columns using Java Spark
问题
以下是翻译好的代码部分:
我有一个嵌套数组,我想把其中的所有元素放入新的列中。到目前为止,我有以下代码。尝试编写了两种方法,但都没有成功。当前未被注释的代码导致以下错误:
> `由于数据类型不匹配,无法解析 'split(response.indicator, ',')':参数 1 需要字符串类型,但 'response.indicator' 的类型为 array<struct<_VALUE:string,_number:bigint>>。;;`
```python
File.withColumn("response.indicator", explode(col("response.ind")))
.withColumn("response.indicator", split(col("response.indicator"), ","))
.withColumn("key", col("response.indicator").getItem(1))
.withColumn("value", col("response.indicator").getItem(0))
.groupBy("ID")
.pivot("key")
.agg(first("value"))
.show(true);
以下是模式:
|-- ID: integer (nullable = true)
|-- response: struct (nullable = true)
| |-- indicator: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- _VALUE: string (nullable = true)
| | | |-- _number: long (nullable = true)
我的数据样式如下:
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|ID |response
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 |[WrappedArray([N,7], [N,8], [N,9], [N,19], [N,20], [N,22], [N,12], [N,1], [N,2], [N,3], [N,4], [N,5], [N,6], [N,10], [N,11], [N,13], [N,14], [N,15], [N,16], [N,17], [N,18], [N,21], [N,25], [N,26])] |
| 2 |[WrappedArray([Y,1], [N,8], [N,9], [N,19], [N,22], [Y,22], [N,20], [Y,7], [Y,23], [N,3], [Y,4], [N,11], [N,6], [Y,27], [N,5], [N,13], [N,14], [N,15], [Y,16], [N,17], [Y,18], [N,21], [N,25], [N,26])] |
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
我希望它的样子如下:
+--------------------+-----------------------------+
|ID | 1 | 2 | 3 | etc
+--------------------+-----------------------------+
| 1 | N | N | N | etc
| 2 | Y | NULL | N | etc
+--------------------+-----------------------------+
如果你需要进一步的帮助,请随时告诉我。
英文:
I have a nested array in which I want to take all the elements in and put them each into a new column. This is what I have so far. Tried writing 2 methods but neither worked. Current error I'm getting from the uncommented code is
> cannot resolve 'split(response.indicator, ',')' due to data type mismatch: argument 1 requires string type, however, 'response.indicator' is of array<struct<_VALUE:string,_number:bigint>> type.;;
File.withColumn("response.indicator", explode(col("response.ind")))
.withColumn("response.indicator", split(col("response.indicator"), ","))
.withColumn("key", col("response.indicator").getItem(1))
.withColumn("value", col("response.indicator").getItem(0))
.groupBy("ID")
.pivot("key")
.agg(first("value"))
.show(true);
/*File.select("response.indicator").collectAsList().forEach(row -> {
String name = String.valueOf(row.getList(0).get(1));
String value = String.valueOf(row.getList(0).get(0));
File.withColumn(name, col(value));
});*/
Here is the schema
|-- ID: integer (nullable = true)
|-- response: struct (nullable = true)
| |-- indicator: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- _VALUE: string (nullable = true)
| | | |-- _number: long (nullable = true)
What my data looks like
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|ID |response
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 |[WrappedArray([N,7], [N,8], [N,9], [N,19], [N,20], [N,22], [N,12], [N,1], [N,2], [N,3], [N,4], [N,5], [N,6], [N,10], [N,11], [N,13], [N,14], [N,15], [N,16], [N,17], [N,18], [N,21], [N,25], [N,26])] |
| 2 |[WrappedArray([Y,1], [N,8], [N,9], [N,19], [N,22], [Y,22], [N,20], [Y,7], [Y,23], [N,3], [Y,4], [N,11], [N,6], [Y,27], [N,5], [N,13], [N,14], [N,15], [Y,16], [N,17], [Y,18], [N,21], [N,25], [N,26])] |
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
What I want it to look like
+--------------------+-----------------------------+
|ID | 1 | 2 | 3 | etc
+--------------------+-----------------------------+
| 1 | N | N | N | etc
| 2 | Y | NULL | N | etc
+--------------------+-----------------------------+
答案1
得分: 0
你的问题是 split(col("response.indicator"), ",")
预期一个字符串列,而 response.indicator
实际上是一个结构体。要"展开"一个名为s
的结构体,你可以像下面这样使用s.*
:
// 我使用模式中提供的名称,而不是你的代码中的名称。
File.withColumn("indicator", explode(col("response.indicator")))
.select("ID", "indicator.*")
.groupBy("ID")
.pivot("_number")
.agg(first("_value"))
.show();
英文:
You problem is split(col("response.indicator"), ",")
expects a string column whereas response.indicator
actually is a struct
. To "unfold" a struct
named s
, you can use s.*
as follows:
// I use the names provided in the schema, not the ones from your code.
File.withColumn("indicator", explode(col("response.indicator")))
.select("ID", "indicator.*")
.groupBy("ID")
.pivot("_number")
.agg(first("_value"))
.show();
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论