从现有的数据框中找到唯一日期,并创建一个带有相应列值的新CSV。

huangapple go评论72阅读模式
英文:

Find unique date from existing dataframe and make a new CSV with corresponding column values

问题

我需要翻译的部分如下:

我有一个时间序列,它看起来像这样:

| 时间 | 每分钟的交易量 |
| -------- | -------------- |
| 2023-05-25T00:00:00Z    | 284            |
| 2023-05-25T00:01:00Z   | 421            |
| .    | .            |
| .   | .            |
| 2023-05-27T23:58:00Z    | 894            |
| 2023-05-27T23:59:00Z   | 357           |

我需要通过迭代时间列来找到唯一的日期,并创建新的列,其中包含每分钟的交易量的相应值。例如,期望的输出是:

| 日期 | min1 | min2 | ... | min1440 |
|:---- |:------:| -----:|-----:|-----:|
| 2023-05-25  | 284    | 421 |... |578 |
| 2023-05-26  | 512    | 645 |... |114 |
| 2023-05-27  | 894    | 357 |... |765 |

我能够获取唯一的日期,但之后我一无所知。请查看我的示例代码:

import pandas as pd

train_data = pd.read_csv('date25to30.csv')

print(pd.to_datetime(train_data['time']).dt.date.unique())

希望这有助于您的工作。

英文:

I have a time series every which looks like this :

Time Volume every minute
2023-05-25T00:00:00Z 284
2023-05-25T00:01:00Z 421
. .
. .
2023-05-27T23:58:00Z 894
2023-05-27T23:59:00Z 357

I have to make new CSV by iterating Time column finding unique date and making new columns with corresponding values of volume every minute. For example desired output:

Date min1 min2 ... min1440
2023-05-25 284 421 ... 578
2023-05-26 512 645 ... 114
2023-05-27 894 357 ... 765

i am able to fetch unique dates but after that i am clueless. please find my sample codes:

import pandas as pd

train_data = pd.read_csv('date25to30.csv')

print(pd.to_datetime(train_data['time']).dt.date.unique())

答案1

得分: 0

首先,在 read_csv 函数中添加 parse_dates 参数,将 Time 列转换为日期时间:

    train_data = pd.read_csv('date25to30.csv', parse_dates=['Time'])

然后通过 to_timedeltaSeries.dt.total_secondsHH:MM:SS 转换为时间增量,再除以 60 并加 1,因为 Python 从 0 开始计数:

    minutes = (pd.to_timedelta(train_data['Time'].dt.strftime('%H:%M:%S'))
                 .dt.total_seconds()
                 .div(60)
                 .astype(int)
                 .add(1))

最后使用 DataFrame.pivot_table 函数,并使用 DataFrame.add_prefix 给列名添加前缀:

    df = (train_data.pivot_table(index=train_data['Time'].dt.date,
                                 columns=minutes,
                                 values='Volume',
                                 aggfunc='sum').add_prefix('min'))
    print(df)
    Time         min1   min2  min1439  min1440
    Time                                      
    2023-05-25  284.0  421.0      NaN      NaN
    2023-05-27    NaN    NaN    894.0    357.0
英文:

First add parameter parse_dates to read_csv for convert Time column to datetimes:

train_data = pd.read_csv('date25to30.csv', parse_dates=['Time'])

Then create minutes by converting HH:MM:SS to timedeltas by to_timedelta and Series.dt.total_seconds, divide 60 and add 1 because python count from 0:

minutes = (pd.to_timedelta(train_data['Time'].dt.strftime('%H:%M:%S'))
             .dt.total_seconds()
             .div(60)
             .astype(int)
             .add(1))

Last pass to DataFrame.pivot_table with DataFrame.add_prefix:

df = (train_data.pivot_table(index=train_data['Time'].dt.date,
                             columns=minutes,
                             values='Volume',
                             aggfunc='sum').add_prefix('min'))
print (df)
Time         min1   min2  min1439  min1440
Time                                      
2023-05-25  284.0  421.0      NaN      NaN
2023-05-27    NaN    NaN    894.0    357.0

huangapple
  • 本文由 发表于 2023年6月1日 18:47:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76381105.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定