如何使用Spark / JAVA将WrappedArray转换为String。

huangapple go评论73阅读模式
英文:

How to convert WrappedArray to String using Spark / JAVA

问题

以下是翻译好的部分:

我有以下数据框:

+--------------------+
|    column          |
+--------------------+
| [99896, 10, ]      |     
|[50, 30, 40, ]      |
+--------------------+

列的架构如下:

|-- column: array (nullable = true)
    |-- element: string (containsNull = true)

当我执行以下代码:

for (Iterator<Row> iter = dataframe.toLocalIterator(); iter.hasNext();){
    String item = (iter.next()).get(0).toString();
    System.out.println(item);
}

我会得到以下输出:

WrappedArray(99896, 10, )
WrappedArray(50, 30, 40, )

我该如何将这个输出转换为类似以下的字符串:

[99896, 10, 50, 30, 40]

我需要您的帮助。

谢谢

英文:

I have the following dataframe :

+--------------------+
|    column          |
+--------------------+
| [99896, 10, ]      |     
|[50, 30, 40, ]      |
+--------------------+

Shema of column is :

 |-- column: array (nullable = true)
    |-- element: string (containsNull = true)

When I execute the following code :

for (Iterator&lt;Row&gt; iter = dataframee.toLocalIterator(); iter.hasNext();){
        String item = (iter.next()).get(0).toString();
        System.out.println(item);
    }

I get the following output :

WrappedArray(99896, 10, )
WrappedArray(50, 30, 40, )

How can I convert this output to String like :

[99896, 10,50,30,40 ]    

I need your help .

Thank you

答案1

得分: 2

基本上,你正在循环遍历每一行,获取该行的WrappedArray,并使用WrappedArraytoString()方法。你需要做的是,不要调用toString(),而是循环遍历该WrappedArray并打印其中的每个值。

英文:

So basically, what you're doing is looping through each row, getting the WrappedArray for that row and using WrappedArray's toString() method. What you need to do instead of calling toString() is to loop over that WrappedArray and print each value in it

答案2

得分: 2

请参考以下翻译:

尝试这个 -

加载提供的测试数据

  Dataset<Row> df = spark.sql("select column from values array(99896, 10, null), array(50, 30, 40, null) T(column)");
        df.show(false);
        df.printSchema();
        /**
         * +-------------+
         * |column       |
         * +-------------+
         * |[99896, 10,] |
         * |[50, 30, 40,]|
         * +-------------+
         *
         * root
         *  |-- column: array (nullable = false)
         *  |    |-- element: integer (containsNull = true)
         */

选项-1


      StringBuilder sb = new StringBuilder();
        sb.append("[");
        for (java.util.Iterator<Row> iter = df.toLocalIterator(); iter.hasNext();){
            String item = (iter.next()).getList(0).stream()
                    .filter(Objects::nonNull)
                    .map(String::valueOf)
                    .collect(Collectors.joining(","));
            sb.append(item).append(",");
        }
        int i = sb.lastIndexOf(",");
        sb.replace(i, i+1, "]");
        System.out.println(sb);
        /**
         * [99896,10,50,30,40]
         */

选项-2


         Dataset<Row> p = df.withColumn("column",
                expr("concat('[', concat_ws(',', collect_list(concat_ws(',', column))), ']')"));
        for (java.util.Iterator<Row> iter = p.toLocalIterator(); iter.hasNext();){
            String item = (iter.next()).get(0).toString();
            System.out.println(item);
        }
        /**
         * [99896,10,50,30,40]
         */

英文:

Try this-

Load the test data provided

  Dataset&lt;Row&gt; df = spark.sql(&quot;select column from values array(99896, 10, null), array(50, 30, 40, null) T(column)&quot;);
        df.show(false);
        df.printSchema();
        /**
         * +-------------+
         * |column       |
         * +-------------+
         * |[99896, 10,] |
         * |[50, 30, 40,]|
         * +-------------+
         *
         * root
         *  |-- column: array (nullable = false)
         *  |    |-- element: integer (containsNull = true)
         */

Option-1


      StringBuilder sb = new StringBuilder();
        sb.append(&quot;[&quot;);
        for (java.util.Iterator&lt;Row&gt; iter = df.toLocalIterator(); iter.hasNext();){
            String item = (iter.next()).getList(0).stream()
                    .filter(Objects::nonNull)
                    .map(String::valueOf)
                    .collect(Collectors.joining(&quot;,&quot;));
            sb.append(item).append(&quot;,&quot;);
        }
        int i = sb.lastIndexOf(&quot;,&quot;);
        sb.replace(i, i+1, &quot;]&quot;);
        System.out.println(sb);
        /**
         * [99896,10,50,30,40]
         */

option-2


         Dataset&lt;Row&gt; p = df.withColumn(&quot;column&quot;,
                expr(&quot;concat(&#39;[&#39;, concat_ws(&#39;,&#39;, collect_list(concat_ws(&#39;,&#39;, column))), &#39;]&#39;)&quot;));
        for (java.util.Iterator&lt;Row&gt; iter = p.toLocalIterator(); iter.hasNext();){
            String item = (iter.next()).get(0).toString();
            System.out.println(item);
        }
        /**
         * [99896,10,50,30,40]
         */

huangapple
  • 本文由 发表于 2020年8月15日 02:54:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/63418572.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定