英文:
Doing a "WHERE IN" clause with Spark, how may I retrain only the columns of my first dataset?
问题
以下是您要翻译的内容:
我是否在正确地进行操作?
我想保留只与“communes”中提到的城市相关的“mobilite”数据。
我通过“join”来模拟“WHERE ... IN ...”子句:这是最好的方法吗?
Dataset<Row> mobilite = this.mobiliteDomicileTravailDataset
.dsRowFluxDomicileTravailPlusDe15ansAvecEmploi(this.session, 2017);
Dataset<Row> communes = communes(2018);
mobilite = mobilite
.join(communes,
communes.col("codeCommune").equalTo(col("code_commune_origine")), "inner")
.selectExpr("mobilite.*");
在“join”操作之后取得的“mobilite”数据集中包含了“communes”的列。这是正常的。但我对它们不感兴趣。然而,我写的代码不起作用,导致出现错误。
如何快速地丢弃它们?
实现我想要的内容的最快代码是什么?
英文:
Am I doing things correctly ?
I would like to retain only mobilite
data that are related to the cities mentioned in communes
.
I simulate the WHERE ... IN ...
clause by a join
: is it the best way to do it ?
Dataset<Row> mobilite = this.mobiliteDomicileTravailDataset
.dsRowFluxDomicileTravailPlusDe15ansAvecEmploi(this.session, 2017);
Dataset<Row> communes = communes(2018);
mobilite = mobilite
.join(communes,
communes.col("codeCommune").equalTo(col("code_commune_origine")), "inner")
.selectExpr("mobilite.*");
The mobilite
dataset taken just after the join
operation have the communes
columns inside. It's normal. But they do not interest me. However, what I've wrote doesn't work and leads to an error.
How do I discard them quickly ?
What is the quickest code to write to achieve what I want ?
答案1
得分: 1
使用leftsemi
连接:
mobilite = mobilite
.join(communes,
communes.col("codeCommune").equalTo(col("code_commune_origine")), "leftsemi")
.selectExpr("mobilite.*");
英文:
use leftsemi
join:
mobilite = mobilite
.join(communes,
communes.col("codeCommune").equalTo(col("code_commune_origine")), "leftsemi")
.selectExpr("mobilite.*");
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论