2020年8月23日 13:45:23go评论73阅读模式

英文:

Join dataframe with order by desc limit on spark /java

问题

我正在使用以下代码：

Dataset<Row> dataframee = df1.as("a").join(df2.as("b"),
    df2.col("id_device").equalTo(df1.col("ID_device_previous")).
        and(df2.col("id_vehicule").equalTo(df1.col("ID_vehicule_previous"))).
        and(df2.col("tracking_time").lt(df1.col("date_track_previous")))
    ,"left").selectExpr("a.*", "b.ID_tracking as ID_pprevious", "b.km as KM_pprevious","b.tracking_time as tracking_time_pprevious","b.speed as speed_pprevious");

我通过上述代码将df1数据框与df2数据框进行了多行连接。

但我想要的是在相同条件下，将df1数据框与df2数据框进行连接，并按照df2.col("tracking_time") desc limit(0,1)进行排序。

编辑

我尝试了以下代码，但它不起作用。

df1.registerTempTable("data");
df2.createOrReplaceTempView("tdays");
Dataset<Row> d_f = sparkSession.sql("select a.*  from data as a  LEFT JOIN (select  b.tracking_time from tdays as b where  b.id_device = a.ID_device_previous and  b.id_vehicule = a.ID_vehicule_previous  and b.tracking_time < a.date_track_previous order by b.tracking_time desc limit 1 )");

我需要你的帮助。

英文:

I'm using the following code :

Dataset &lt;Row&gt; dataframee =  df1.as(&quot;a&quot;).join(df2.as(&quot;b&quot;),
            df2.col(&quot;id_device&quot;).equalTo(df1.col(&quot;ID_device_previous&quot;)).
                    and(df2.col(&quot;id_vehicule&quot;).equalTo(df1.col(&quot;ID_vehicule_previous&quot;))).
                and(df2.col(&quot;tracking_time&quot;).lt(df1.col(&quot;date_track_previous&quot;)))
            ,&quot;left&quot;).selectExpr(&quot;a.*&quot;, &quot;b.ID_tracking as ID_pprevious&quot;, &quot;b.km as KM_pprevious&quot;,&quot;b.tracking_time as tracking_time_pprevious&quot;,&quot;b.speed as speed_pprevious&quot;);

I get the df1 dataframe join with multiple line from df2 dataframe.

But what I want is to join the df1 dataframe with df2 dataframe ON the same condition and order by df2.col("tracking_time") desc limit(0,1)

EDIT

I tried the following code , but it doesn't work .

df1.registerTempTable(&quot;data&quot;);
df2.createOrReplaceTempView(&quot;tdays&quot;);
Dataset&lt;Row&gt; d_f = sparkSession.sql(&quot;select a.*  from data as a  LEFT JOIN (select  b.tracking_time from tdays as b where  b.id_device = a.ID_device_previous and  b.id_vehicule = a.ID_vehicule_previous  and b.tracking_time &lt; a.date_track_previous order by b.tracking_time desc limit 1 )&quot;);

I need your help

答案1

得分: 1

你可以通过多种方式来实现这个，我知道的有以下几种方法：

1）你可以在合并后的dataframee DF上使用dropDuplicates。

val finalDF = dataframee.dropDuplicates("指定你希望在最终输出中保持独特性的列")

（或者）

2）使用Spark SQL

import spark.sql.implicits._
df1.createOrReplaceTempView("table1")
df2.createOrReplaceTempView("table2")
spark.sql("连接查询和按照独特列分组").select(df("*"))

英文:

you can do this in multiple ways which I'm aware of

you can do dropDuplicates on your joined dataframee DF.

val finalDF = dataframee.dropDuplicates("") // specified columns which you want to be distinct/unique in final output

(OR)

spark-sql

import spark.sql.implicits._
df1.createOrReplaceTempViews(&quot;table1&quot;)
df2.createOrReplaceTempViews(&quot;table2&quot;)
spark.sql(&quot;join query with groupBy distinct columns&quot;).select(df(&quot;*&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将DataFrame与Spark / Java上的按降序排序限制连接。

问题

答案1

Spring @Value – 有没有一种方法可以分配一个默认的空映射？

所有两个数字的组合

比较对象使用 equals 方法。

AudioRecord的read()方法返回的字节数并不与sizeInBytes参数中请求的字节数相同。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论