英文:
Resampling Rows minute wise not working in for Even Minutes in Python DataFrame
问题
I have df which has 5 columns. A column named date which has minute-wise data of a few days but the data start at 9:15
and ends at 15:29
. And then there are four other columns which are named first, max, min, and last which have numerical numbers in them.
我有一个包含5列的数据框(df)。其中一列名为“date”,包含了几天的按分钟的数据,但数据从 9:15
开始,到 15:29
结束。然后还有另外四列,它们分别命名为“first”、“max”、“min”和“last”,其中包含了数字。
I wrote a code that uses x
mins as a variable. It resamples the rows and gives rows of x minutes.
我编写了一个使用 x
分钟作为变量的代码。它会重新采样行,并提供 x 分钟的行。
The first of resampled will be the 'first' of first row.
The 'last' of resampled will be the 'last' of the last row.
The max of resampled will be the highest of all the rows of the max column.
The low of resampled will be low of all the rows for the low column.
And the date will have datetime of x minutes intervals.
重新采样后的第一行将成为第一行的“first”值。
重新采样后的“last”将成为最后一行的“last”值。
重新采样后的“max”将成为所有行中“max”列的最高值。
重新采样后的“low”将成为所有行中“low”列的最低值。
日期将以 x 分钟的时间间隔进行采样。
My problem is for some minutes the code is working perfectly. But for other minutes I am getting the wrong time as the first row.
我的问题是对于某些分钟,代码运行得非常完美。但对于其他分钟,我得到了错误的时间作为第一行。
Instead of resampled data starting from 9:15
. It starts with some other minute.
重新采样的数据不是从 9:15
开始的,而是从其他某一分钟开始的。
Code:
def resample_df(df, x_minutes = '15T'):
df.set_index('date', inplace=True)
resampled_df = df.resample(x_minutes).agg({
'first': 'first',
'max': 'max',
'min': 'min',
'last': 'last'
})
resampled_df.reset_index(inplace=True)
return resampled_df
Input:
date first max min last
0 2023-06-01 09:15:00 0.014657 0.966861 0.556195 0.903073
1 2023-06-01 09:16:00 0.255174 0.607714 0.845804 0.039933
2 2023-06-01 09:17:00 0.956839 0.881803 0.876322 0.552568
Output: when x_minutes = '6T'
date first max min last
0 2023-06-01 09:12:00 0.014657 0.966861 0.556195 0.552568
1 2023-06-01 09:18:00 0.437867 0.988005 0.162957 0.897419
2 2023-06-01 09:24:00 0.296486 0.370957 0.013994 0.108506
The data shows 9:12 but I don't have 9:12. Why is it giving me the wrong data?
数据显示为 9:12,但我并没有 9:12。为什么会给我错误的数据?
Note: It works perfectly when minutes entered are odd. e.g. x_minutes = '15T'.
注意:当输入的分钟数为奇数时,它运行得非常完美,例如 x_minutes = '15T'。
Code to create a dummy df:
import pandas as pd
import random
from datetime import datetime, timedelta
# Define the number of days for which data is generated
num_days = 5
# Define the start and end times for each day
start_time = datetime.strptime('09:15', '%H:%M').time()
end_time = datetime.strptime('15:30', '%H:%M').time()
# Create a list of all the timestamps for the specified days
timestamps = []
current_date = datetime.now().replace(hour=start_time.hour, minute=start_time.minute, second=0, microsecond=0)
end_date = current_date + timedelta(days=num_days)
while current_date < end_date:
current_time = current_date.time()
if start_time <= current_time <= end_time:
timestamps.append(current_date)
current_date += timedelta(minutes=1)
# Generate random data for each column
data = {
'date': timestamps,
'first': [random.random() for _ in range(len(timestamps))],
'max': [random.random() for _ in range(len(timestamps))],
'min': [random.random() for _ in range(len(timestamps))],
'last': [random.random() for _ in range(len(timestamps))]
}
# Create the DataFrame
df = pd.DataFrame(data)
# Display the resulting DataFrame
display(df)
这是创建虚拟数据框的代码。
英文:
I have df which has 5 columns. A column named date which has minute-wise data of a few days but the data start at 9:15
and ends at 15:29
. And then there are four other columns which are named first, max, min, and last which have numerical numbers in them.
I wrote a code that uses x
mins as a variable. It resamples the rows and gives rows of x minutes.
The first of resampled will be the 'first' of first row. <br>
The 'last' of resampled will be the 'last' of the last row. <br>
The max of resampled will be the highest of all the rows of the max column. <br>
The low of resampled will be low of all the rows for the low column.
And the date will have datetime of x minutes intervals.
My problem is for some minutes the code is working perfectly. But for other minutes I am getting the wrong time as the first row.
Instead of resampled data starting from 9:15
. It starts with some other minute.
Code:
def resample_df(df, x_minutes = '15T'):
df.set_index('date', inplace=True)
resampled_df = df.resample(x_minutes).agg({
'first': 'first',
'max': 'max',
'min': 'min',
'last': 'last'
})
resampled_df.reset_index(inplace=True)
return resampled_df
Input:
date first max min last
0 2023-06-01 09:15:00 0.014657 0.966861 0.556195 0.903073
1 2023-06-01 09:16:00 0.255174 0.607714 0.845804 0.039933
2 2023-06-01 09:17:00 0.956839 0.881803 0.876322 0.552568
Output: when x_minutes = '6T'
date first max min last
0 2023-06-01 09:12:00 0.014657 0.966861 0.556195 0.552568
1 2023-06-01 09:18:00 0.437867 0.988005 0.162957 0.897419
2 2023-06-01 09:24:00 0.296486 0.370957 0.013994 0.108506
The data shows 9:12 but I don't have 9:12. Why is it giving me the wrong data?
Note: It works prefectly when minutes entered are odd. e.g. x_minutes = '15T'
Code to create a dummy df:
import pandas as pd
import random
from datetime import datetime, timedelta
# Define the number of days for which data is generated
num_days = 5
# Define the start and end times for each day
start_time = datetime.strptime('09:15', '%H:%M').time()
end_time = datetime.strptime('15:30', '%H:%M').time()
# Create a list of all the timestamps for the specified days
timestamps = []
current_date = datetime.now().replace(hour=start_time.hour, minute=start_time.minute, second=0, microsecond=0)
end_date = current_date + timedelta(days=num_days)
while current_date < end_date:
current_time = current_date.time()
if start_time <= current_time <= end_time:
timestamps.append(current_date)
current_date += timedelta(minutes=1)
# Generate random data for each column
data = {
'date': timestamps,
'first': [random.random() for _ in range(len(timestamps))],
'max': [random.random() for _ in range(len(timestamps))],
'min': [random.random() for _ in range(len(timestamps))],
'last': [random.random() for _ in range(len(timestamps))]
}
# Create the DataFrame
df = pd.DataFrame(data)
# Display the resulting DataFrame
display(df)
答案1
得分: 1
使用以下代码:
resampled_df = df.resample(x_minutes, origin='start').agg({
'first': 'first',
'max': 'max',
'min': 'min',
'last': 'last'
})
英文:
Use:
resampled_df = df.resample(x_minutes, origin = 'start').agg({
'first': 'first',
'max': 'max',
'min': 'min',
'last': 'last'
})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论