用Spark Java API从映射中创建数据帧。

huangapple go评论61阅读模式
英文:

create dataframe from map in spark java API

问题

我正尝试在Java API中使用Spark SQL,下面这段简单的代码(从官方指南复制:https://spark.apache.org/docs/latest/rdd-programming-guide.html)会导致IntelliJ报错。

它抱怨ClassTag这个东西,我不知道如何创建它或者让它自动导入之类的。

        List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
        JavaRDD<Integer> distData = sc.parallelize(data);

我理解它想要使用三个参数

public <T> RDD<T> parallelize(final Seq<T> seq, final int numSlices, final ClassTag<T> evidence$1) {

但是我怎么得到这个evidence$1呢?
官方示例中也没有这个参数。

请在这个问题上给予帮助。

英文:

I am trying to use spark sql in java API, below simple stuff (copied from official guide: https://spark.apache.org/docs/latest/rdd-programming-guide.html) would not make intelij happy.

It complains the ClassTag thing, which i do not know how to create it or let it automatically imported or something.

        List&lt;Integer&gt; data = Arrays.asList(1, 2, 3, 4, 5);
        JavaRDD&lt;Integer&gt; distData = sc.parallelize(data);

I understand it wants to use 3 arguments

public &lt;T&gt; RDD&lt;T&gt; parallelize(final Seq&lt;T&gt; seq, final int numSlices, final ClassTag&lt;T&gt; evidence$1) {

but how can i get this evidence$1 thing?
the official example did not have the argument either

Please help on this.

答案1

得分: 1

我决定查看官方指南附带的示例的源代码。
结果发现需要创建 Java Spark 上下文。
在我使用了指南中源代码的示例后,我自己的代码也成功运行了。

import scala.Tuple2;
import org.apache.spark.api.java.JavaSparkContext;

List<Tuple2<String, String>> data =
        Arrays.asList(
                new Tuple2<>("key1", "value1"),
                new Tuple2<>("key2", "value2")
        );

JavaPairRDD<String, String> dataRdd = jsc.parallelizePairs(data);
英文:

I decided to look into the source code of the example came with the official guideline.
and turns out it needs to create java spark context
I got my one working after I used the example from the guide's source code.

import scala.Tuple2;
import org.apache.spark.api.java.JavaSparkContext;

List&lt;Tuple2&lt;String, String&gt;&gt; data =
		Arrays.asList(
				new Tuple2&lt;&gt;(&quot;key1&quot;,&quot;value1&quot;)
				, new Tuple2&lt;&gt;(&quot;key2&quot;, &quot;value2&quot;)
		);

JavaPairRDD&lt;String, String&gt; dataRdd = jsc.parallelizePairs(data);

huangapple
  • 本文由 发表于 2020年9月13日 22:12:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/63871761.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定