英文:
Group consecutive rows using spark scala with rows repeating
问题
--------------+-------------------------+
| space_id |template |frequency| day |timestamp |
+------------------------------------+-----------+---------+-----------
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:00:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:15:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:30:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:15:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:30:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:45:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:30:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:45:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T10:00:00+05:30|
英文:
--------------+-------------------------+
| space_id |template |frequency| day |timestamp |
+------------------------------------+-----------+---------+-----------
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:00:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:15:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:30:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:45:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T10:00:00+05:30|
Here I have a unique id as space_id, Template(which may have temperature, humidity, CO2), frequency column which says what is the frequency in which I receive the data from a sensor, a day column and finally a timestamp column
Here I need to group the data in 30 minute batch according to the timestamp
I am able to find 30minutes batches as 09:00:00,09:15:00 & 09:30:00 in one batch and next 09:30:00,09:45:00,10:00:00 so on.
But what I need is 09:00:00,09:15:00 & 09:30:00 and 09:15:00, 09:30:00 ,09:45:00 , 09:30:00 ,09:45:00, 10:00:00 so on
I need to make slots for 30minute batch for each timestamp value
In Simple words. From above table. I need groups of rows(1,2,3), rows(2,3,4),row(3,4,5) so on..
答案1
得分: 1
你要找的窗口设置是:
from pyspark.sql import Window
w = Window.partitionBy('space_id').orderBy('timestamp').rowsBetween(Window.currentRow, Window.currentRow + 2)
英文:
The window setting you're looking for is:
from pyspark.sql import Window
w = Window.partitionBy('space_id').orderBy('timestamp').rowsBetween(Window.currentRow, Window.currentRow + 2)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论