问题

--------------+-------------------------+
| space_id   |template   |frequency| day         |timestamp               |
+------------------------------------+-----------+---------+-----------
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:00:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:15:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:30:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:15:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:30:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:45:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:30:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:45:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T10:00:00+05:30|

英文:

--------------+-------------------------+
| space_id   |template   |frequency| day         |timestamp               |
+------------------------------------+-----------+---------+-----------
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:00:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:15:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:30:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T09:45:00+05:30|
|321d8|temp|15|2023-02-22T00:00:00+05:30|2023-02-22T10:00:00+05:30|

Here I have a unique id as space_id, Template(which may have temperature, humidity, CO2), frequency column which says what is the frequency in which I receive the data from a sensor, a day column and finally a timestamp column
Here I need to group the data in 30 minute batch according to the timestamp

I am able to find 30minutes batches as 09:00:00,09:15:00 & 09:30:00 in one batch and next 09:30:00,09:45:00,10:00:00 so on.
But what I need is 09:00:00,09:15:00 & 09:30:00 and 09:15:00, 09:30:00 ,09:45:00 , 09:30:00 ,09:45:00, 10:00:00 so on
I need to make slots for 30minute batch for each timestamp value
In Simple words. From above table. I need groups of rows(1,2,3), rows(2,3,4),row(3,4,5) so on..

答案1

得分: 1

你要找的窗口设置是：

from pyspark.sql import Window

w = Window.partitionBy('space_id').orderBy('timestamp').rowsBetween(Window.currentRow, Window.currentRow + 2)

英文:

The window setting you're looking for is:

from pyspark.sql import Window

w = Window.partitionBy(&#39;space_id&#39;).orderBy(&#39;timestamp&#39;).rowsBetween(Window.currentRow, Window.currentRow + 2)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Spark Scala将连续的行分组，其中行重复。

问题

答案1

“Deferred inline method `foo` in trait `Foo` cannot be invoked”: Pairs

Databricks读取Parquet花费的时间太长。

哪个 jar 文件在 Databricks 中引入了这个资源文件？

将Spark SQL转换为Python Spark / Databricks管道事件日志。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论