英文:
How to get previous non equal salary record in the current row in spark sql
问题
id | startdate | enddate | salary | prevSalary |
---|---|---|---|---|
1 | 2015-09-07 | 9999-12-31 | 194000 | 192000 |
1 | 2015-03-01 | 2015-09-06 | 194000 | 192000 |
1 | 2014-04-10 | 2015-02-28 | 194000 | 192000 |
1 | 2014-01-01 | 2014-04-09 | 192000 | 180000 |
1 | 2013-07-31 | 2013-12-31 | 180000 | null (or) 0 |
英文:
id | startdate | enddate | salary |
---|---|---|---|
1 | 2015-09-07 | 9999-12-31 | 194000 |
1 | 2015-03-01 | 2015-09-06 | 194000 |
1 | 2014-04-10 | 2015-02-28 | 194000 |
1 | 2014-01-01 | 2014-04-09 | 192000 |
1 | 2013-07-31 | 2013-12-31 | 180000 |
This is the table I have, I need to introduce a new column which shows the previous salary for the employee. The challenge that I have is that if the salary is unchanged, it cannot be shown as the previous salary, so in this case the result table would be
id | startdate | enddate | salary | prevSalary |
---|---|---|---|---|
1 | 2015-09-07 | 9999-12-31 | 194000 | 192000 |
1 | 2015-03-01 | 2015-09-06 | 194000 | 192000 |
1 | 2014-04-10 | 2015-02-28 | 194000 | 192000 |
1 | 2014-01-01 | 2014-04-09 | 192000 | 180000 |
1 | 2013-07-31 | 2013-12-31 | 180000 | null (or) 0 |
I tried using the "lag" operator but it doesn't give me the desired output, instead it just picks the salary from the last record which is incorrect.
My query:
select *, lag(salary)
over(partition by id, order by startdate) as prevSalary
from tablename.
I also tried partitioning it with "id" and "salary" but I am not able to formulate the proper solution.
Note, there are multiple ids in the table, I am using one id just to give an example. Also the date records are consistent and have no gaps.
答案1
得分: 2
以下是您要求的翻译:
一旦您使用LAG
选择了先前的薪水,在相同的薪水分区中,将有一个会保持良好结果的第一个,并且其他的薪水等于上一次的薪水。您可以在完全相同的薪水分区中应用FIRST_VALUE
窗口函数,以覆盖所需的上一次薪水。
WITH cte AS (
SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
FROM tab
)
SELECT id, startdate, enddate, salary,
FIRST_VALUE(lastsalary) OVER(PARTITION BY id, salary ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary
FROM cte
输出:
id | startdate | enddate | salary | lastsalary |
---|---|---|---|---|
1 | 2015-09-07 | 9999-12-31 | 194000 | 192000 |
1 | 2015-03-01 | 2015-09-06 | 194000 | 192000 |
1 | 2014-04-10 | 2015-02-28 | 194000 | 192000 |
1 | 2014-01-01 | 2014-04-09 | 192000 | 180000 |
1 | 2013-07-31 | 2013-12-31 | 180000 | null |
编辑:在ahmed的聪明建议下,如果薪水不是单调递增的,而且id恰好具有先前出现的薪水,您可能需要一种适用于间隙和岛屿情景的解决方案,为此,您需要按以下方式重建您的分区:
WITH gaps AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY startdate DESC) -
ROW_NUMBER() OVER(PARTITION BY id, salary ORDER BY startdate DESC) grp
FROM tab
), cte AS (
SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
FROM gaps
)
SELECT *,
FIRST_VALUE(lastsalary) OVER(PARTITION BY id, grp ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary
FROM cte
ORDER BY startdate DESC
英文:
Once you selected the previous salary with LAG
, in the same partition of salary, there will be the first one that will hold the good result, and the other ones for which salary = lastsalary. You can just apply the FIRST_VALUE
window function in the very same salary partition, to overwrite the needed lastsalary.
WITH cte AS (
SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
FROM tab
)
SELECT id, startdate, enddate, salary,
FIRST_VALUE(lastsalary) OVER(PARTITION BY id, salary ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary
FROM cte
Output:
id | startdate | enddate | salary | lastsalary |
---|---|---|---|---|
1 | 2015-09-07 | 9999-12-31 | 194000 | 192000 |
1 | 2015-03-01 | 2015-09-06 | 194000 | 192000 |
1 | 2014-04-10 | 2015-02-28 | 194000 | 192000 |
1 | 2014-01-01 | 2014-04-09 | 192000 | 180000 |
1 | 2013-07-31 | 2013-12-31 | 180000 | null |
Edit: On ahmed's smart suggestion, if salary is not monotonically increasing, and the id happen to have a previously occurring salary, you could need a solution that works in gaps-and-islands setting, for which you would need to rebuild your partitioning as follows:
WITH gaps AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY startdate DESC) -
ROW_NUMBER() OVER(PARTITION BY id, salary ORDER BY startdate DESC) grp
FROM tab
), cte AS (
SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
FROM gaps
)
SELECT *,
FIRST_VALUE(lastsalary) OVER(PARTITION BY id, grp ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary
FROM cte
ORDER BY startdate DESC
答案2
得分: 0
你可以使用一个简单的相关子查询,内部选择查询最近与相同 id 不同薪水的记录:
SELECT s.*,
(SELECT salary
FROM salaries s2
WHERE s.id = s2.id AND s.enddate > s2.startdate AND s.salary <> s2.salary
ORDER BY enddate DESC
LIMIT 1) prevSalary
FROM salaries s;
id | startdate | enddate | salary | prevSalary |
---|---|---|---|---|
1 | 2015-09-07 | 9999-12-31 | 19400 | 19200 |
1 | 2015-03-01 | 2015-09-06 | 19400 | 19200 |
1 | 2014-04-10 | 2015-02-28 | 19400 | 19200 |
1 | 2014-01-01 | 2014-04-09 | 19200 | 18000 |
1 | 2013-07-31 | 2013-12-31 | 18000 | NULL |
英文:
You could use a simple correlated subquery, with the inner select that looks back for the most recent different salary with the same id:
SELECT s.*,
(SELECT salary
FROM salaries s2
WHERE s.id = s2.id AND s.enddate > s2.startdate AND s.salary <> s2.salary
ORDER BY enddate DESC
LIMIT 1) prevSalary
FROM salaries s;
id | startdate | enddate | salary | prevSalary |
---|---|---|---|---|
1 | 2015-09-07 | 9999-12-31 | 19400 | 19200 |
1 | 2015-03-01 | 2015-09-06 | 19400 | 19200 |
1 | 2014-04-10 | 2015-02-28 | 19400 | 19200 |
1 | 2014-01-01 | 2014-04-09 | 19200 | 18000 |
1 | 2013-07-31 | 2013-12-31 | 18000 | NULL |
答案3
得分: 0
SELECT
current.employee_id,
current.salary AS current_salary,
previous.salary AS previous_salary
FROM
employees current
LEFT JOIN
employees previous ON previous.employee_id = current.employee_id
AND previous.salary <> current.salary
AND previous.hire_date < current.hire_date
WHERE
current.salary IS NOT NULL
ORDER BY
current.employee_id, current.hire_date DESC;
英文:
SELECT
current.employee_id,
current.salary AS current_salary,
previous.salary AS previous_salary
FROM
employees current
LEFT JOIN
employees previous ON previous.employee_id = current.employee_id
AND previous.salary <> current.salary
AND previous.hire_date < current.hire_date
WHERE
current.salary IS NOT NULL
ORDER BY
current.employee_id, current.hire_date DESC;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论