2020年10月12日 03:04:02go评论148阅读模式

英文:

Doing a "WHERE IN" clause with Spark, how may I retrain only the columns of my first dataset?

问题

以下是您要翻译的内容：

我是否在正确地进行操作？

我想保留只与“communes”中提到的城市相关的“mobilite”数据。
我通过“join”来模拟“WHERE ... IN ...”子句：这是最好的方法吗？

Dataset<Row> mobilite = this.mobiliteDomicileTravailDataset
   .dsRowFluxDomicileTravailPlusDe15ansAvecEmploi(this.session, 2017);

Dataset<Row> communes = communes(2018);

mobilite = mobilite
  .join(communes, 
        communes.col("codeCommune").equalTo(col("code_commune_origine")), "inner")
  .selectExpr("mobilite.*");

在“join”操作之后取得的“mobilite”数据集中包含了“communes”的列。这是正常的。但我对它们不感兴趣。然而，我写的代码不起作用，导致出现错误。

如何快速地丢弃它们？
实现我想要的内容的最快代码是什么？

英文:

Am I doing things correctly ?

I would like to retain only mobilite data that are related to the cities mentioned in communes.
I simulate the WHERE ... IN ... clause by a join : is it the best way to do it ?

Dataset&lt;Row&gt; mobilite = this.mobiliteDomicileTravailDataset
   .dsRowFluxDomicileTravailPlusDe15ansAvecEmploi(this.session, 2017);

Dataset&lt;Row&gt; communes = communes(2018);

mobilite = mobilite
  .join(communes, 
        communes.col(&quot;codeCommune&quot;).equalTo(col(&quot;code_commune_origine&quot;)), &quot;inner&quot;)
  .selectExpr(&quot;mobilite.*&quot;);

The mobilite dataset taken just after the join operation have the communes columns inside. It's normal. But they do not interest me. However, what I've wrote doesn't work and leads to an error.

How do I discard them quickly ?
What is the quickest code to write to achieve what I want ?

答案1

得分: 1

使用leftsemi连接：

mobilite = mobilite
  .join(communes, 
        communes.col("codeCommune").equalTo(col("code_commune_origine")), "leftsemi")
  .selectExpr("mobilite.*");

英文:

use leftsemi join:

mobilite = mobilite
  .join(communes, 
        communes.col(&quot;codeCommune&quot;).equalTo(col(&quot;code_commune_origine&quot;)), &quot;leftsemi&quot;)
  .selectExpr(&quot;mobilite.*&quot;);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Spark进行“WHERE IN”子句，我如何仅保留我的第一个数据集的列？

问题

答案1

指向JNA结构的指针未正确解析

能否从LinkedHashMap中提取键集，对其应用排序算法，然后将其放回映射中？

从Oracle Java 8升级到Adopt OpenJDK

前台服务不发送通知 (API33)

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论