如何根据条件在Spark SQL中选择不同的行。

huangapple go评论57阅读模式
英文:

How to select different rows base on conditions in Spark sql

问题

我创建了一个基于 num 列并按名称分组的排名列。我试图实现逻辑,如果 num 不为 0,则选择第一行的颜色值,否则使用第二行或第三行的颜色值,其中 num 不为 0。例如,对于名称 A,应选择黄色。对于名称 B,应选择绿色。请问有人能帮忙提供如何做到这一点的建议吗?谢谢。

Name.    colour     num.     rank 
 A.       blue       0         1
 A        yellow.    300       2
 B        green      100.      1
 B        brown      500.      2
英文:

I created a rank column base on num column and grouped by name. I'm trying to implement logic to select the colour value in the first row if num is not 0, otherwise use the colour value from the second row or 3rd row, where the num is not 0 For example for Name A, colour yellow should be selected. For Name B, colour green should be selected.
Could someone please help to suggest how to do this? Thank you.

Name.    colour     num.     rank 
 A.       blue       0         1
 A        yellow.    300       2
 B        green      100.      1
 B        brown      500.      2     

答案1

得分: 1

你可以首先使用filter排除0,然后使用Window分区分配row_number,接着再次使用filter筛选不等于1的值。

示例:

df
  .filter(col("num").gt(0))
  .withColumn("row_number", row_number().over(Window.partitionBy("name").orderBy(col("rank"))))
  .filter(col("row_number").equalTo(1))
  .drop("row_number")

输出:

+----+------+---+----+
|name|color |num|rank|
+----+------+---+----+
|A   |yellow|300|2   |
|B   |green |100|1   |
+----+------+---+----+

这仅在假设namenum的组合是唯一的情况下有效,如果不是这种情况,则需要进一步调整。此外,如果你展示导致上述表格的代码,可能可以以更高效的方式解决这个问题。祝好运!

英文:

You can use first use filter to exclude 0s, followed by a Window partition to assign row_number, and then another filter to filter out values thare are not 1.

Example

df
  .filter(col("num").gt(0))
  .withColumn("row_number", row_number().over(Window.partitionBy("name").orderBy(col("rank"))))
  .filter(col("row_number").equalTo(1))
  .drop("row_number")

Output

+----+------+---+----+
|name|color |num|rank|
+----+------+---+----+
|A   |yellow|300|2   |
|B   |green |100|1   |
+----+------+---+----+

This only works under the assumption that the combination of name and num are unique, if this is not the case, then further adjustments are needed. Also, if you show the code that leads to your mentioned table, this problem could be solved in a more efficient way (probably). Good luck!

huangapple
  • 本文由 发表于 2023年6月29日 12:12:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76578014.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定