2023年6月1日 21:53:14go评论116阅读模式

英文:

Resampling Rows minute wise not working in for Even Minutes in Python DataFrame

问题

I have df which has 5 columns. A column named date which has minute-wise data of a few days but the data start at 9:15 and ends at 15:29. And then there are four other columns which are named first, max, min, and last which have numerical numbers in them.

我有一个包含5列的数据框（df）。其中一列名为“date”，包含了几天的按分钟的数据，但数据从 9:15 开始，到 15:29 结束。然后还有另外四列，它们分别命名为“first”、“max”、“min”和“last”，其中包含了数字。

I wrote a code that uses x mins as a variable. It resamples the rows and gives rows of x minutes.

我编写了一个使用 x 分钟作为变量的代码。它会重新采样行，并提供 x 分钟的行。

The first of resampled will be the 'first' of first row.
The 'last' of resampled will be the 'last' of the last row.
The max of resampled will be the highest of all the rows of the max column.
The low of resampled will be low of all the rows for the low column.
And the date will have datetime of x minutes intervals.

重新采样后的第一行将成为第一行的“first”值。
重新采样后的“last”将成为最后一行的“last”值。
重新采样后的“max”将成为所有行中“max”列的最高值。
重新采样后的“low”将成为所有行中“low”列的最低值。
日期将以 x 分钟的时间间隔进行采样。

My problem is for some minutes the code is working perfectly. But for other minutes I am getting the wrong time as the first row.

我的问题是对于某些分钟，代码运行得非常完美。但对于其他分钟，我得到了错误的时间作为第一行。

Instead of resampled data starting from 9:15. It starts with some other minute.

重新采样的数据不是从 9:15 开始的，而是从其他某一分钟开始的。

Code:

def resample_df(df, x_minutes = '15T'):
    
    df.set_index('date', inplace=True)
    resampled_df = df.resample(x_minutes).agg({
        'first': 'first',
        'max': 'max',
        'min': 'min',
        'last': 'last'
    })
    resampled_df.reset_index(inplace=True)
    return resampled_df

Input:

	date	               first	max	        min	        last
0	2023-06-01 09:15:00	0.014657	0.966861	0.556195	0.903073
1	2023-06-01 09:16:00	0.255174	0.607714	0.845804	0.039933
2	2023-06-01 09:17:00	0.956839	0.881803	0.876322	0.552568

Output: when x_minutes = '6T'

	date	               first	max	        min	        last
0	2023-06-01 09:12:00	0.014657	0.966861	0.556195	0.552568
1	2023-06-01 09:18:00	0.437867	0.988005	0.162957	0.897419
2	2023-06-01 09:24:00	0.296486	0.370957	0.013994	0.108506

The data shows 9:12 but I don't have 9:12. Why is it giving me the wrong data?

数据显示为 9:12，但我并没有 9:12。为什么会给我错误的数据？

Note: It works perfectly when minutes entered are odd. e.g. x_minutes = '15T'.

注意：当输入的分钟数为奇数时，它运行得非常完美，例如 x_minutes = '15T'。

Code to create a dummy df:

import pandas as pd
import random
from datetime import datetime, timedelta
# Define the number of days for which data is generated
num_days = 5
# Define the start and end times for each day
start_time = datetime.strptime('09:15', '%H:%M').time()
end_time = datetime.strptime('15:30', '%H:%M').time()
# Create a list of all the timestamps for the specified days
timestamps = []
current_date = datetime.now().replace(hour=start_time.hour, minute=start_time.minute, second=0, microsecond=0)
end_date = current_date + timedelta(days=num_days)
while current_date < end_date:
    current_time = current_date.time()
    if start_time <= current_time <= end_time:
        timestamps.append(current_date)
    current_date += timedelta(minutes=1)
# Generate random data for each column
data = {
    'date': timestamps,
    'first': [random.random() for _ in range(len(timestamps))],
    'max': [random.random() for _ in range(len(timestamps))],
    'min': [random.random() for _ in range(len(timestamps))],
    'last': [random.random() for _ in range(len(timestamps))]
}
# Create the DataFrame
df = pd.DataFrame(data)
# Display the resulting DataFrame
display(df)

这是创建虚拟数据框的代码。

英文:

I wrote a code that uses x mins as a variable. It resamples the rows and gives rows of x minutes.

The first of resampled will be the 'first' of first row. <br>
The 'last' of resampled will be the 'last' of the last row. <br>
The max of resampled will be the highest of all the rows of the max column. <br>
The low of resampled will be low of all the rows for the low column.
And the date will have datetime of x minutes intervals.

My problem is for some minutes the code is working perfectly. But for other minutes I am getting the wrong time as the first row.

Instead of resampled data starting from 9:15. It starts with some other minute.

Code:

def resample_df(df, x_minutes = &#39;15T&#39;):
    
    df.set_index(&#39;date&#39;, inplace=True)
    resampled_df = df.resample(x_minutes).agg({
        &#39;first&#39;: &#39;first&#39;,
        &#39;max&#39;: &#39;max&#39;,
        &#39;min&#39;: &#39;min&#39;,
        &#39;last&#39;: &#39;last&#39;
    })
    resampled_df.reset_index(inplace=True)
    return resampled_df

Input:

	date	               first	max	        min	        last
0	2023-06-01 09:15:00	0.014657	0.966861	0.556195	0.903073
1	2023-06-01 09:16:00	0.255174	0.607714	0.845804	0.039933
2	2023-06-01 09:17:00	0.956839	0.881803	0.876322	0.552568

Output: when x_minutes = '6T'

	date	               first	max	        min	        last
0	2023-06-01 09:12:00	0.014657	0.966861	0.556195	0.552568
1	2023-06-01 09:18:00	0.437867	0.988005	0.162957	0.897419
2	2023-06-01 09:24:00	0.296486	0.370957	0.013994	0.108506

The data shows 9:12 but I don't have 9:12. Why is it giving me the wrong data?

Note: It works prefectly when minutes entered are odd. e.g. x_minutes = '15T'

Code to create a dummy df:

import pandas as pd
import random
from datetime import datetime, timedelta
# Define the number of days for which data is generated
num_days = 5
# Define the start and end times for each day
start_time = datetime.strptime(&#39;09:15&#39;, &#39;%H:%M&#39;).time()
end_time = datetime.strptime(&#39;15:30&#39;, &#39;%H:%M&#39;).time()
# Create a list of all the timestamps for the specified days
timestamps = []
current_date = datetime.now().replace(hour=start_time.hour, minute=start_time.minute, second=0, microsecond=0)
end_date = current_date + timedelta(days=num_days)
while current_date &lt; end_date:
    current_time = current_date.time()
    if start_time &lt;= current_time &lt;= end_time:
        timestamps.append(current_date)
    current_date += timedelta(minutes=1)
# Generate random data for each column
data = {
    &#39;date&#39;: timestamps,
    &#39;first&#39;: [random.random() for _ in range(len(timestamps))],
    &#39;max&#39;: [random.random() for _ in range(len(timestamps))],
    &#39;min&#39;: [random.random() for _ in range(len(timestamps))],
    &#39;last&#39;: [random.random() for _ in range(len(timestamps))]
}
# Create the DataFrame
df = pd.DataFrame(data)
# Display the resulting DataFrame
display(df)

答案1

得分: 1

使用以下代码：

resampled_df = df.resample(x_minutes, origin='start').agg({
    'first': 'first',
    'max': 'max',
    'min': 'min',
    'last': 'last'
})

英文:

Use:

resampled_df = df.resample(x_minutes, origin = &#39;start&#39;).agg({
    &#39;first&#39;: &#39;first&#39;,
    &#39;max&#39;: &#39;max&#39;,
    &#39;min&#39;: &#39;min&#39;,
    &#39;last&#39;: &#39;last&#39;
})

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python数据框中，按分钟重新采样行不适用于偶数分钟。

问题

答案1

自定义API错误代码的最佳实践

Python 3 Pandas: 自定义排序字符串列表

Golang JSON Marshal/Unmarshal postgres now()

如何在Django模板中从视图中打印HTML内容？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。