英文:
How to get previous non equal salary record in the current row in spark sql
问题
| id | startdate | enddate | salary | prevSalary |
|---|---|---|---|---|
| 1 | 2015-09-07 | 9999-12-31 | 194000 | 192000 |
| 1 | 2015-03-01 | 2015-09-06 | 194000 | 192000 |
| 1 | 2014-04-10 | 2015-02-28 | 194000 | 192000 |
| 1 | 2014-01-01 | 2014-04-09 | 192000 | 180000 |
| 1 | 2013-07-31 | 2013-12-31 | 180000 | null (or) 0 |
英文:
| id | startdate | enddate | salary |
|---|---|---|---|
| 1 | 2015-09-07 | 9999-12-31 | 194000 |
| 1 | 2015-03-01 | 2015-09-06 | 194000 |
| 1 | 2014-04-10 | 2015-02-28 | 194000 |
| 1 | 2014-01-01 | 2014-04-09 | 192000 |
| 1 | 2013-07-31 | 2013-12-31 | 180000 |
This is the table I have, I need to introduce a new column which shows the previous salary for the employee. The challenge that I have is that if the salary is unchanged, it cannot be shown as the previous salary, so in this case the result table would be
| id | startdate | enddate | salary | prevSalary |
|---|---|---|---|---|
| 1 | 2015-09-07 | 9999-12-31 | 194000 | 192000 |
| 1 | 2015-03-01 | 2015-09-06 | 194000 | 192000 |
| 1 | 2014-04-10 | 2015-02-28 | 194000 | 192000 |
| 1 | 2014-01-01 | 2014-04-09 | 192000 | 180000 |
| 1 | 2013-07-31 | 2013-12-31 | 180000 | null (or) 0 |
I tried using the "lag" operator but it doesn't give me the desired output, instead it just picks the salary from the last record which is incorrect.
My query:
select *, lag(salary)
over(partition by id, order by startdate) as prevSalary
from tablename.
I also tried partitioning it with "id" and "salary" but I am not able to formulate the proper solution.
Note, there are multiple ids in the table, I am using one id just to give an example. Also the date records are consistent and have no gaps.
答案1
得分: 2
以下是您要求的翻译:
一旦您使用LAG选择了先前的薪水,在相同的薪水分区中,将有一个会保持良好结果的第一个,并且其他的薪水等于上一次的薪水。您可以在完全相同的薪水分区中应用FIRST_VALUE窗口函数,以覆盖所需的上一次薪水。
WITH cte AS (
SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
FROM tab
)
SELECT id, startdate, enddate, salary,
FIRST_VALUE(lastsalary) OVER(PARTITION BY id, salary ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary
FROM cte
输出:
| id | startdate | enddate | salary | lastsalary |
|---|---|---|---|---|
| 1 | 2015-09-07 | 9999-12-31 | 194000 | 192000 |
| 1 | 2015-03-01 | 2015-09-06 | 194000 | 192000 |
| 1 | 2014-04-10 | 2015-02-28 | 194000 | 192000 |
| 1 | 2014-01-01 | 2014-04-09 | 192000 | 180000 |
| 1 | 2013-07-31 | 2013-12-31 | 180000 | null |
编辑:在ahmed的聪明建议下,如果薪水不是单调递增的,而且id恰好具有先前出现的薪水,您可能需要一种适用于间隙和岛屿情景的解决方案,为此,您需要按以下方式重建您的分区:
WITH gaps AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY startdate DESC) -
ROW_NUMBER() OVER(PARTITION BY id, salary ORDER BY startdate DESC) grp
FROM tab
), cte AS (
SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
FROM gaps
)
SELECT *,
FIRST_VALUE(lastsalary) OVER(PARTITION BY id, grp ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary
FROM cte
ORDER BY startdate DESC
英文:
Once you selected the previous salary with LAG, in the same partition of salary, there will be the first one that will hold the good result, and the other ones for which salary = lastsalary. You can just apply the FIRST_VALUE window function in the very same salary partition, to overwrite the needed lastsalary.
WITH cte AS (
SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
FROM tab
)
SELECT id, startdate, enddate, salary,
FIRST_VALUE(lastsalary) OVER(PARTITION BY id, salary ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary
FROM cte
Output:
| id | startdate | enddate | salary | lastsalary |
|---|---|---|---|---|
| 1 | 2015-09-07 | 9999-12-31 | 194000 | 192000 |
| 1 | 2015-03-01 | 2015-09-06 | 194000 | 192000 |
| 1 | 2014-04-10 | 2015-02-28 | 194000 | 192000 |
| 1 | 2014-01-01 | 2014-04-09 | 192000 | 180000 |
| 1 | 2013-07-31 | 2013-12-31 | 180000 | null |
Edit: On ahmed's smart suggestion, if salary is not monotonically increasing, and the id happen to have a previously occurring salary, you could need a solution that works in gaps-and-islands setting, for which you would need to rebuild your partitioning as follows:
WITH gaps AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY startdate DESC) -
ROW_NUMBER() OVER(PARTITION BY id, salary ORDER BY startdate DESC) grp
FROM tab
), cte AS (
SELECT *, LAG(salary) OVER(PARTITION BY id ORDER BY startdate) AS lastsalary
FROM gaps
)
SELECT *,
FIRST_VALUE(lastsalary) OVER(PARTITION BY id, grp ORDER BY startdate ROWS UNBOUNDED PRECEDING) AS lastsalary
FROM cte
ORDER BY startdate DESC
答案2
得分: 0
你可以使用一个简单的相关子查询,内部选择查询最近与相同 id 不同薪水的记录:
SELECT s.*,
(SELECT salary
FROM salaries s2
WHERE s.id = s2.id AND s.enddate > s2.startdate AND s.salary <> s2.salary
ORDER BY enddate DESC
LIMIT 1) prevSalary
FROM salaries s;
| id | startdate | enddate | salary | prevSalary |
|---|---|---|---|---|
| 1 | 2015-09-07 | 9999-12-31 | 19400 | 19200 |
| 1 | 2015-03-01 | 2015-09-06 | 19400 | 19200 |
| 1 | 2014-04-10 | 2015-02-28 | 19400 | 19200 |
| 1 | 2014-01-01 | 2014-04-09 | 19200 | 18000 |
| 1 | 2013-07-31 | 2013-12-31 | 18000 | NULL |
英文:
You could use a simple correlated subquery, with the inner select that looks back for the most recent different salary with the same id:
SELECT s.*,
(SELECT salary
FROM salaries s2
WHERE s.id = s2.id AND s.enddate > s2.startdate AND s.salary <> s2.salary
ORDER BY enddate DESC
LIMIT 1) prevSalary
FROM salaries s;
| id | startdate | enddate | salary | prevSalary |
|---|---|---|---|---|
| 1 | 2015-09-07 | 9999-12-31 | 19400 | 19200 |
| 1 | 2015-03-01 | 2015-09-06 | 19400 | 19200 |
| 1 | 2014-04-10 | 2015-02-28 | 19400 | 19200 |
| 1 | 2014-01-01 | 2014-04-09 | 19200 | 18000 |
| 1 | 2013-07-31 | 2013-12-31 | 18000 | NULL |
答案3
得分: 0
SELECT
current.employee_id,
current.salary AS current_salary,
previous.salary AS previous_salary
FROM
employees current
LEFT JOIN
employees previous ON previous.employee_id = current.employee_id
AND previous.salary <> current.salary
AND previous.hire_date < current.hire_date
WHERE
current.salary IS NOT NULL
ORDER BY
current.employee_id, current.hire_date DESC;
英文:
SELECT
current.employee_id,
current.salary AS current_salary,
previous.salary AS previous_salary
FROM
employees current
LEFT JOIN
employees previous ON previous.employee_id = current.employee_id
AND previous.salary <> current.salary
AND previous.hire_date < current.hire_date
WHERE
current.salary IS NOT NULL
ORDER BY
current.employee_id, current.hire_date DESC;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论