英文:
How to cast all columns of Spark Dataset to String in Java without withColumn?
问题
我尝试了在这里指定的使用withColumn的解决方案:
但是,对于大量列(1k-6k)的情况,这个解决方案会影响性能。处理需要超过6小时,然后被中止。
作为替代,我尝试使用类似下面的map来进行类型转换,但是我在这里遇到了错误:
MapFunction<Column, Column> mapFunction = (c) -> {
return c.cast("string");
};
dataset = dataset.map(mapFunction, Encoders.bean(Column.class));
上述代码片段出现错误:
类型 Dataset<Row> 中的方法 map(Function1<Row,U>, Encoder<U>) 对于参数 (MapFunction<Column,Column>, Encoder<Column>) 不适用。
所使用的导入语句:
import org.apache.spark.api.java.function.MapFunction;
英文:
I've tried the solution using withColumn specified here:
But, the solution is taking a hit on performance for huge number of columns (1k-6k). It takes more than 6 hours and then gets aborted.
Alternatively, I'm trying to use map to cast like below, but I get error here:
MapFunction<Column, Column> mapFunction = (c) -> {
return c.cast("string");
};
dataset = dataset.map(mapFunction, Encoders.bean(Column.class));
Error with above snippet:
The method map(Function1<Row,U>, Encoder<U>) in the type Dataset<Row> is not applicable for the arguments (MapFunction<Column,Column>, Encoder<Column>)
Import used:
import org.apache.spark.api.java.function.MapFunction;
答案1
得分: 0
你确定你指的是1k-6k列,还是指的行?
但无论如何,我会像这样通用地转换列:
import spark.implicits._
val df = Seq((1, 2), (2, 3), (3, 4)).toDF("a", "b")
val cols = for {
a <- df.columns
} yield col(a).cast(StringType)
df.select(cols : _*)
英文:
Are you sure you mean 1k-6k columns or do you mean rows?
But in any case I cast columns genericly like this:
import spark.implicits._
val df = Seq((1, 2), (2, 3), (3, 4)).toDF("a", "b")
val cols = for {
a <- df.columns
} yield col(a).cast(StringType)
df.select(cols : _*)
答案2
得分: 0
以下是要翻译的内容:
对于寻找解决方法的任何人,找到了以下解决方案:
String[] strColNameArray = dataset.columns();
List<Column> colNames = new ArrayList<>();
for(String strColName : strColNameArray){
colNames.add(new Column(strColName).cast("string"));
}
dataset = dataset.select(JavaConversions.asScalaBuffer(colNames));
英文:
Found the below solution for anyone looking for this:
String[] strColNameArray = dataset.columns();
List<Column> colNames = new ArrayList<>();
for(String strColName : strColNameArray){
colNames.add(new Column(strColName).cast("string"));
}
dataset = dataset.select(JavaConversions.asScalaBuffer(colNames));`
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论