英文:
create dataframe from map in spark java API
问题
我正尝试在Java API中使用Spark SQL,下面这段简单的代码(从官方指南复制:https://spark.apache.org/docs/latest/rdd-programming-guide.html)会导致IntelliJ报错。
它抱怨ClassTag
这个东西,我不知道如何创建它或者让它自动导入之类的。
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> distData = sc.parallelize(data);
我理解它想要使用三个参数
public <T> RDD<T> parallelize(final Seq<T> seq, final int numSlices, final ClassTag<T> evidence$1) {
但是我怎么得到这个evidence$1
呢?
官方示例中也没有这个参数。
请在这个问题上给予帮助。
英文:
I am trying to use spark sql in java API, below simple stuff (copied from official guide: https://spark.apache.org/docs/latest/rdd-programming-guide.html) would not make intelij happy.
It complains the ClassTag
thing, which i do not know how to create it or let it automatically imported or something.
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> distData = sc.parallelize(data);
I understand it wants to use 3 arguments
public <T> RDD<T> parallelize(final Seq<T> seq, final int numSlices, final ClassTag<T> evidence$1) {
but how can i get this evidence$1
thing?
the official example did not have the argument either
Please help on this.
答案1
得分: 1
我决定查看官方指南附带的示例的源代码。
结果发现需要创建 Java Spark 上下文。
在我使用了指南中源代码的示例后,我自己的代码也成功运行了。
import scala.Tuple2;
import org.apache.spark.api.java.JavaSparkContext;
List<Tuple2<String, String>> data =
Arrays.asList(
new Tuple2<>("key1", "value1"),
new Tuple2<>("key2", "value2")
);
JavaPairRDD<String, String> dataRdd = jsc.parallelizePairs(data);
英文:
I decided to look into the source code of the example came with the official guideline.
and turns out it needs to create java spark context
I got my one working after I used the example from the guide's source code.
import scala.Tuple2;
import org.apache.spark.api.java.JavaSparkContext;
List<Tuple2<String, String>> data =
Arrays.asList(
new Tuple2<>("key1","value1")
, new Tuple2<>("key2", "value2")
);
JavaPairRDD<String, String> dataRdd = jsc.parallelizePairs(data);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论