问题

我有一个数据集，其中一些列名带有点号。当涉及到向量装配器（Vector Assembler）时就会出问题。似乎它们不兼容，所以我尝试了许多方法来转义点号，但是没有任何改变。

String[] expincols = newfilenameavgpeaks.columns();

VectorAssembler assemblerexp = new VectorAssembler()
                    .setInputCols(expincols)
                    .setOutputCol("intensity");

Dataset<Row> filenameoutput = assemblerexp.transform(newfilenameavgpeaks);

我已经用"`", "``","```","````","'",'"'等来包装expincols中的每个元素，但是没有效果！我还尝试了将这些方法应用于newfilenameavgpeaks的列名，但仍然没有改变。有什么办法可以进行转义吗？

英文:

I have a Dataset where some column names have dots. The problem arises when it comes to Vector Assembler. It seems that they do not get along, so I tried to escape the dots in many ways but nothing changed.

String[] expincols = newfilenameavgpeaks.columns();

VectorAssembler assemblerexp = new VectorAssembler()
                    .setInputCols(expincols)
                    .setOutputCol(&quot;intensity&quot;);

Dataset&lt;Row&gt; filenameoutput = assemblerexp.transform(newfilenameavgpeaks);

I have wrapped every element in expincols with: "`", "``","```","````","'",'"', etc but nothing! I also tried these in the column names of newfilenameavgpeaks but still nothing. Any ideas how to escape?

答案1

得分: 0

如果数据集包含列 a.b，您仍然可以使用 df.col(`a.b`) 来选择一个带有 . 的列名。这是因为 Dataset.col 会尝试解析列名，并且能够处理反引号。

然而，VectorAssembler.transform 会使用所提供数据集的模式，并使用此 StructType 来处理 VectorAssembler.transformSchema 中的列名。然而，StructType 的 apply 方法并不包含处理反引号的逻辑，如果列名不完全匹配，它会抛出 IllegalArgumentException。

因此，唯一的选择是在将列提供给 VectorAssembler 之前对它们进行重命名：

Dataset&lt;Row&gt; newfilenameavgpeaks = ...

for( String col : newfilenameavgpeaks.columns()) {
    newfilenameavgpeaks = newfilenameavgpeaks
            .withColumnRenamed(col, col.replace(&#39;.&#39;, &#39;_&#39;));
}

VectorAssembler assemblerexp = new VectorAssembler()
    .setInputCols(newfilenameavgpeaks.columns()).setOutputCol(&quot;intensity&quot;);

Dataset&lt;Row&gt; filenameoutput = assemblerexp.transform(newfilenameavgpeaks);

英文:

If the dataset contains a column a.b you can still use df.col(`a.b`) to select a column with a . in its name. This works because Dataset.col tries to resolve the column name and can handle the backticks.

VectorAssembler.transform however takes the schema of the supplied dataset and uses this StructType to handle the column names in VectorAssembler.transformSchema. The apply method of StructType simply does not contain the logic to handle the backticks and throws an IllegalArgumentException if the column names do not match exactly.

Therefore the only option is to rename the columns before supplying them to the VectorAssembler:

Dataset&lt;Row&gt; newfilenameavgpeaks = ...

for( String col : newfilenameavgpeaks.columns()) {
    newfilenameavgpeaks = newfilenameavgpeaks
            .withColumnRenamed(col, col.replace(&#39;.&#39;, &#39;_&#39;));
}

VectorAssembler assemblerexp = new VectorAssembler()
    .setInputCols(newfilenameavgpeaks.columns()).setOutputCol(&quot;intensity&quot;);

Dataset&lt;Row&gt; filenameoutput = assemblerexp.transform(newfilenameavgpeaks);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Spark Java: 在向量汇聚器中转义列名称中的点号

问题

答案1

如何使用正则表达式选择字符串格式化的值

有没有办法在JFreeChart中将十字准线标签调整几个像素？

什么意思 ::？为什么末尾有 .java？

基于平面法线计算物体旋转。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论