使用Spark进行“WHERE IN”子句,我如何仅保留我的第一个数据集的列?

huangapple go评论140阅读模式
英文:

Doing a "WHERE IN" clause with Spark, how may I retrain only the columns of my first dataset?

问题

以下是您要翻译的内容:

我是否在正确地进行操作?

我想保留只与“communes”中提到的城市相关的“mobilite”数据。
我通过“join”来模拟“WHERE ... IN ...”子句:这是最好的方法吗?

Dataset<Row> mobilite = this.mobiliteDomicileTravailDataset
   .dsRowFluxDomicileTravailPlusDe15ansAvecEmploi(this.session, 2017);

Dataset<Row> communes = communes(2018);

mobilite = mobilite
  .join(communes, 
        communes.col("codeCommune").equalTo(col("code_commune_origine")), "inner")
  .selectExpr("mobilite.*");

在“join”操作之后取得的“mobilite”数据集中包含了“communes”的列。这是正常的。但我对它们不感兴趣。然而,我写的代码不起作用,导致出现错误。

如何快速地丢弃它们?
实现我想要的内容的最快代码是什么?

英文:

Am I doing things correctly ?

I would like to retain only mobilite data that are related to the cities mentioned in communes.
I simulate the WHERE ... IN ... clause by a join : is it the best way to do it ?

Dataset&lt;Row&gt; mobilite = this.mobiliteDomicileTravailDataset
   .dsRowFluxDomicileTravailPlusDe15ansAvecEmploi(this.session, 2017);

Dataset&lt;Row&gt; communes = communes(2018);

mobilite = mobilite
  .join(communes, 
        communes.col(&quot;codeCommune&quot;).equalTo(col(&quot;code_commune_origine&quot;)), &quot;inner&quot;)
  .selectExpr(&quot;mobilite.*&quot;);

The mobilite dataset taken just after the join operation have the communes columns inside. It's normal. But they do not interest me. However, what I've wrote doesn't work and leads to an error.

How do I discard them quickly ?
What is the quickest code to write to achieve what I want ?

答案1

得分: 1

使用leftsemi连接:

mobilite = mobilite
  .join(communes, 
        communes.col("codeCommune").equalTo(col("code_commune_origine")), "leftsemi")
  .selectExpr("mobilite.*");
英文:

use leftsemi join:

mobilite = mobilite
  .join(communes, 
        communes.col(&quot;codeCommune&quot;).equalTo(col(&quot;code_commune_origine&quot;)), &quot;leftsemi&quot;)
  .selectExpr(&quot;mobilite.*&quot;);

huangapple
  • 本文由 发表于 2020年10月12日 03:04:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/64307977.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定