在Python数据框中,按分钟重新采样行不适用于偶数分钟。

huangapple go评论82阅读模式
英文:

Resampling Rows minute wise not working in for Even Minutes in Python DataFrame

问题

I have df which has 5 columns. A column named date which has minute-wise data of a few days but the data start at 9:15 and ends at 15:29. And then there are four other columns which are named first, max, min, and last which have numerical numbers in them.

我有一个包含5列的数据框(df)。其中一列名为“date”,包含了几天的按分钟的数据,但数据从 9:15 开始,到 15:29 结束。然后还有另外四列,它们分别命名为“first”、“max”、“min”和“last”,其中包含了数字。

I wrote a code that uses x mins as a variable. It resamples the rows and gives rows of x minutes.

我编写了一个使用 x 分钟作为变量的代码。它会重新采样行,并提供 x 分钟的行。

The first of resampled will be the 'first' of first row.
The 'last' of resampled will be the 'last' of the last row.
The max of resampled will be the highest of all the rows of the max column.
The low of resampled will be low of all the rows for the low column.
And the date will have datetime of x minutes intervals.

重新采样后的第一行将成为第一行的“first”值。
重新采样后的“last”将成为最后一行的“last”值。
重新采样后的“max”将成为所有行中“max”列的最高值。
重新采样后的“low”将成为所有行中“low”列的最低值。
日期将以 x 分钟的时间间隔进行采样。

My problem is for some minutes the code is working perfectly. But for other minutes I am getting the wrong time as the first row.

我的问题是对于某些分钟,代码运行得非常完美。但对于其他分钟,我得到了错误的时间作为第一行。

Instead of resampled data starting from 9:15. It starts with some other minute.

重新采样的数据不是从 9:15 开始的,而是从其他某一分钟开始的。

Code:

def resample_df(df, x_minutes = '15T'):
    
    df.set_index('date', inplace=True)

    resampled_df = df.resample(x_minutes).agg({
        'first': 'first',
        'max': 'max',
        'min': 'min',
        'last': 'last'
    })

    resampled_df.reset_index(inplace=True)

    return resampled_df

Input:

	date	               first	max	        min	        last
0	2023-06-01 09:15:00	0.014657	0.966861	0.556195	0.903073
1	2023-06-01 09:16:00	0.255174	0.607714	0.845804	0.039933
2	2023-06-01 09:17:00	0.956839	0.881803	0.876322	0.552568

Output: when x_minutes = '6T'

	date	               first	max	        min	        last
0	2023-06-01 09:12:00	0.014657	0.966861	0.556195	0.552568
1	2023-06-01 09:18:00	0.437867	0.988005	0.162957	0.897419
2	2023-06-01 09:24:00	0.296486	0.370957	0.013994	0.108506

The data shows 9:12 but I don't have 9:12. Why is it giving me the wrong data?

数据显示为 9:12,但我并没有 9:12。为什么会给我错误的数据?

Note: It works perfectly when minutes entered are odd. e.g. x_minutes = '15T'.

注意:当输入的分钟数为奇数时,它运行得非常完美,例如 x_minutes = '15T'。

Code to create a dummy df:

import pandas as pd
import random
from datetime import datetime, timedelta

# Define the number of days for which data is generated
num_days = 5

# Define the start and end times for each day
start_time = datetime.strptime('09:15', '%H:%M').time()
end_time = datetime.strptime('15:30', '%H:%M').time()

# Create a list of all the timestamps for the specified days
timestamps = []
current_date = datetime.now().replace(hour=start_time.hour, minute=start_time.minute, second=0, microsecond=0)
end_date = current_date + timedelta(days=num_days)
while current_date < end_date:
    current_time = current_date.time()
    if start_time <= current_time <= end_time:
        timestamps.append(current_date)
    current_date += timedelta(minutes=1)

# Generate random data for each column
data = {
    'date': timestamps,
    'first': [random.random() for _ in range(len(timestamps))],
    'max': [random.random() for _ in range(len(timestamps))],
    'min': [random.random() for _ in range(len(timestamps))],
    'last': [random.random() for _ in range(len(timestamps))]
}

# Create the DataFrame
df = pd.DataFrame(data)

# Display the resulting DataFrame
display(df)

这是创建虚拟数据框的代码。

英文:

I have df which has 5 columns. A column named date which has minute-wise data of a few days but the data start at 9:15 and ends at 15:29. And then there are four other columns which are named first, max, min, and last which have numerical numbers in them.

I wrote a code that uses x mins as a variable. It resamples the rows and gives rows of x minutes.

The first of resampled will be the 'first' of first row. <br>
The 'last' of resampled will be the 'last' of the last row. <br>
The max of resampled will be the highest of all the rows of the max column. <br>
The low of resampled will be low of all the rows for the low column.
And the date will have datetime of x minutes intervals.

My problem is for some minutes the code is working perfectly. But for other minutes I am getting the wrong time as the first row.

Instead of resampled data starting from 9:15. It starts with some other minute.

Code:

def resample_df(df, x_minutes = &#39;15T&#39;):
    
    df.set_index(&#39;date&#39;, inplace=True)

    resampled_df = df.resample(x_minutes).agg({
        &#39;first&#39;: &#39;first&#39;,
        &#39;max&#39;: &#39;max&#39;,
        &#39;min&#39;: &#39;min&#39;,
        &#39;last&#39;: &#39;last&#39;
    })

    resampled_df.reset_index(inplace=True)

    return resampled_df

Input:

	date	               first	max	        min	        last
0	2023-06-01 09:15:00	0.014657	0.966861	0.556195	0.903073
1	2023-06-01 09:16:00	0.255174	0.607714	0.845804	0.039933
2	2023-06-01 09:17:00	0.956839	0.881803	0.876322	0.552568

Output: when x_minutes = '6T'

	date	               first	max	        min	        last
0	2023-06-01 09:12:00	0.014657	0.966861	0.556195	0.552568
1	2023-06-01 09:18:00	0.437867	0.988005	0.162957	0.897419
2	2023-06-01 09:24:00	0.296486	0.370957	0.013994	0.108506

The data shows 9:12 but I don't have 9:12. Why is it giving me the wrong data?

Note: It works prefectly when minutes entered are odd. e.g. x_minutes = '15T'

Code to create a dummy df:

import pandas as pd
import random
from datetime import datetime, timedelta

# Define the number of days for which data is generated
num_days = 5

# Define the start and end times for each day
start_time = datetime.strptime(&#39;09:15&#39;, &#39;%H:%M&#39;).time()
end_time = datetime.strptime(&#39;15:30&#39;, &#39;%H:%M&#39;).time()

# Create a list of all the timestamps for the specified days
timestamps = []
current_date = datetime.now().replace(hour=start_time.hour, minute=start_time.minute, second=0, microsecond=0)
end_date = current_date + timedelta(days=num_days)
while current_date &lt; end_date:
    current_time = current_date.time()
    if start_time &lt;= current_time &lt;= end_time:
        timestamps.append(current_date)
    current_date += timedelta(minutes=1)

# Generate random data for each column
data = {
    &#39;date&#39;: timestamps,
    &#39;first&#39;: [random.random() for _ in range(len(timestamps))],
    &#39;max&#39;: [random.random() for _ in range(len(timestamps))],
    &#39;min&#39;: [random.random() for _ in range(len(timestamps))],
    &#39;last&#39;: [random.random() for _ in range(len(timestamps))]
}

# Create the DataFrame
df = pd.DataFrame(data)

# Display the resulting DataFrame
display(df)

答案1

得分: 1

使用以下代码:

resampled_df = df.resample(x_minutes, origin='start').agg({
    'first': 'first',
    'max': 'max',
    'min': 'min',
    'last': 'last'
})
英文:

Use:

resampled_df = df.resample(x_minutes, origin = &#39;start&#39;).agg({
    &#39;first&#39;: &#39;first&#39;,
    &#39;max&#39;: &#39;max&#39;,
    &#39;min&#39;: &#39;min&#39;,
    &#39;last&#39;: &#39;last&#39;
})

huangapple
  • 本文由 发表于 2023年6月1日 21:53:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76382642.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定