英文:
How to select different rows base on conditions in Spark sql
问题
我创建了一个基于 num
列并按名称分组的排名列。我试图实现逻辑,如果 num
不为 0,则选择第一行的颜色值,否则使用第二行或第三行的颜色值,其中 num
不为 0。例如,对于名称 A,应选择黄色。对于名称 B,应选择绿色。请问有人能帮忙提供如何做到这一点的建议吗?谢谢。
Name. colour num. rank
A. blue 0 1
A yellow. 300 2
B green 100. 1
B brown 500. 2
英文:
I created a rank column base on num
column and grouped by name. I'm trying to implement logic to select the colour value in the first row if num is not 0, otherwise use the colour value from the second row or 3rd row, where the num is not 0
For example for Name A, colour yellow should be selected. For Name B, colour green should be selected.
Could someone please help to suggest how to do this? Thank you.
Name. colour num. rank
A. blue 0 1
A yellow. 300 2
B green 100. 1
B brown 500. 2
答案1
得分: 1
你可以首先使用filter
排除0
,然后使用Window分区
分配row_number
,接着再次使用filter
筛选不等于1
的值。
示例:
df
.filter(col("num").gt(0))
.withColumn("row_number", row_number().over(Window.partitionBy("name").orderBy(col("rank"))))
.filter(col("row_number").equalTo(1))
.drop("row_number")
输出:
+----+------+---+----+
|name|color |num|rank|
+----+------+---+----+
|A |yellow|300|2 |
|B |green |100|1 |
+----+------+---+----+
这仅在假设name
和num
的组合是唯一的情况下有效,如果不是这种情况,则需要进一步调整。此外,如果你展示导致上述表格的代码,可能可以以更高效的方式解决这个问题。祝好运!
英文:
You can use first use filter
to exclude 0
s, followed by a Window partition
to assign row_number
, and then another filter
to filter out values thare are not 1
.
Example
df
.filter(col("num").gt(0))
.withColumn("row_number", row_number().over(Window.partitionBy("name").orderBy(col("rank"))))
.filter(col("row_number").equalTo(1))
.drop("row_number")
Output
+----+------+---+----+
|name|color |num|rank|
+----+------+---+----+
|A |yellow|300|2 |
|B |green |100|1 |
+----+------+---+----+
This only works under the assumption that the combination of name
and num
are unique, if this is not the case, then further adjustments are needed. Also, if you show the code that leads to your mentioned table, this problem could be solved in a more efficient way (probably). Good luck!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论