如何在Spark SQL中获取当前行中之前的非相等薪资记录

huangapple go评论130阅读模式
英文:

How to get previous non equal salary record in the current row in spark sql

问题

id startdate enddate salary prevSalary
1 2015-09-07 9999-12-31 194000 192000
1 2015-03-01 2015-09-06 194000 192000
1 2014-04-10 2015-02-28 194000 192000
1 2014-01-01 2014-04-09 192000 180000
1 2013-07-31 2013-12-31 180000 null (or) 0
英文:
id startdate enddate salary
1 2015-09-07 9999-12-31 194000
1 2015-03-01 2015-09-06 194000
1 2014-04-10 2015-02-28 194000
1 2014-01-01 2014-04-09 192000
1 2013-07-31 2013-12-31 180000

This is the table I have, I need to introduce a new column which shows the previous salary for the employee. The challenge that I have is that if the salary is unchanged, it cannot be shown as the previous salary, so in this case the result table would be

id startdate enddate salary prevSalary
1 2015-09-07 9999-12-31 194000 192000
1 2015-03-01 2015-09-06 194000 192000
1 2014-04-10 2015-02-28 194000 192000
1 2014-01-01 2014-04-09 192000 180000
1 2013-07-31 2013-12-31 180000 null (or) 0

I tried using the "lag" operator but it doesn't give me the desired output, instead it just picks the salary from the last record which is incorrect.

My query:

select *, lag(salary) 
    over(partition by id, order by startdate) as prevSalary 
    from tablename.

I also tried partitioning it with "id" and "salary" but I am not able to formulate the proper solution.

Note, there are multiple ids in the table, I am using one id just to give an example. Also the date records are consistent and have no gaps.

答案1

得分: 2

以下是您要求的翻译:

一旦您使用LAG选择了先前的薪水,在相同的薪水分区中,将有一个会保持良好结果的第一个,并且其他的薪水等于上一次的薪水。您可以在完全相同的薪水分区中应用FIRST_VALUE窗口函数,以覆盖所需的上一次薪水。

WITH cte AS (
    SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
    FROM tab
)
SELECT id, startdate, enddate, salary, 
       FIRST_VALUE(lastsalary) OVER(PARTITION BY id, salary ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary 
FROM cte

输出

id startdate enddate salary lastsalary
1 2015-09-07 9999-12-31 194000 192000
1 2015-03-01 2015-09-06 194000 192000
1 2014-04-10 2015-02-28 194000 192000
1 2014-01-01 2014-04-09 192000 180000
1 2013-07-31 2013-12-31 180000 null

编辑:在ahmed的聪明建议下,如果薪水不是单调递增的,而且id恰好具有先前出现的薪水,您可能需要一种适用于间隙和岛屿情景的解决方案,为此,您需要按以下方式重建您的分区:

WITH gaps AS (
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY id ORDER BY startdate DESC) - 
             ROW_NUMBER() OVER(PARTITION BY id, salary ORDER BY startdate DESC) grp
    FROM tab
), cte AS (
    SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
    FROM gaps
)
SELECT *,
       FIRST_VALUE(lastsalary) OVER(PARTITION BY id, grp ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary 
FROM cte 
ORDER BY startdate DESC
英文:

Once you selected the previous salary with LAG, in the same partition of salary, there will be the first one that will hold the good result, and the other ones for which salary = lastsalary. You can just apply the FIRST_VALUE window function in the very same salary partition, to overwrite the needed lastsalary.

WITH cte AS (
	SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
	FROM tab
)
SELECT id, startdate, enddate, salary, 
       FIRST_VALUE(lastsalary) OVER(PARTITION BY id, salary ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary 
FROM cte

Output:

id startdate enddate salary lastsalary
1 2015-09-07 9999-12-31 194000 192000
1 2015-03-01 2015-09-06 194000 192000
1 2014-04-10 2015-02-28 194000 192000
1 2014-01-01 2014-04-09 192000 180000
1 2013-07-31 2013-12-31 180000 null

Edit: On ahmed's smart suggestion, if salary is not monotonically increasing, and the id happen to have a previously occurring salary, you could need a solution that works in gaps-and-islands setting, for which you would need to rebuild your partitioning as follows:

WITH gaps AS (
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY id ORDER BY startdate DESC) - 
             ROW_NUMBER() OVER(PARTITION BY id, salary ORDER BY startdate DESC) grp
    FROM tab
), cte AS (
    SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
    FROM gaps
)
SELECT *,
       FIRST_VALUE(lastsalary) OVER(PARTITION BY id, grp ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary 
FROM cte 
ORDER BY startdate DESC

答案2

得分: 0

你可以使用一个简单的相关子查询,内部选择查询最近与相同 id 不同薪水的记录:

SELECT s.*,
    (SELECT salary
     FROM salaries s2
     WHERE s.id = s2.id AND s.enddate > s2.startdate AND s.salary <> s2.salary
     ORDER BY enddate DESC
     LIMIT 1) prevSalary
FROM salaries s;
id startdate enddate salary prevSalary
1 2015-09-07 9999-12-31 19400 19200
1 2015-03-01 2015-09-06 19400 19200
1 2014-04-10 2015-02-28 19400 19200
1 2014-01-01 2014-04-09 19200 18000
1 2013-07-31 2013-12-31 18000 NULL
英文:

You could use a simple correlated subquery, with the inner select that looks back for the most recent different salary with the same id:

SELECT s.*,
    (SELECT salary
     FROM salaries s2
     WHERE s.id = s2.id AND s.enddate &gt; s2.startdate AND s.salary &lt;&gt; s2.salary
     ORDER BY enddate DESC
     LIMIT 1) prevSalary
FROM salaries s;
id startdate enddate salary prevSalary
1 2015-09-07 9999-12-31 19400 19200
1 2015-03-01 2015-09-06 19400 19200
1 2014-04-10 2015-02-28 19400 19200
1 2014-01-01 2014-04-09 19200 18000
1 2013-07-31 2013-12-31 18000 NULL

答案3

得分: 0

SELECT
current.employee_id,
current.salary AS current_salary,
previous.salary AS previous_salary
FROM
employees current
LEFT JOIN
employees previous ON previous.employee_id = current.employee_id
AND previous.salary <> current.salary
AND previous.hire_date < current.hire_date
WHERE
current.salary IS NOT NULL
ORDER BY
current.employee_id, current.hire_date DESC;

英文:
SELECT
  current.employee_id,
  current.salary AS current_salary,
  previous.salary AS previous_salary
FROM
  employees current
LEFT JOIN
  employees previous ON previous.employee_id = current.employee_id
    AND previous.salary &lt;&gt; current.salary
    AND previous.hire_date &lt; current.hire_date
WHERE
  current.salary IS NOT NULL
ORDER BY
  current.employee_id, current.hire_date DESC;

huangapple
  • 本文由 发表于 2023年5月25日 17:26:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76330761.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定