英文:
Find unique date from existing dataframe and make a new CSV with corresponding column values
问题
我需要翻译的部分如下:
我有一个时间序列,它看起来像这样:
| 时间 | 每分钟的交易量 |
| -------- | -------------- |
| 2023-05-25T00:00:00Z | 284 |
| 2023-05-25T00:01:00Z | 421 |
| . | . |
| . | . |
| 2023-05-27T23:58:00Z | 894 |
| 2023-05-27T23:59:00Z | 357 |
我需要通过迭代时间列来找到唯一的日期,并创建新的列,其中包含每分钟的交易量的相应值。例如,期望的输出是:
| 日期 | min1 | min2 | ... | min1440 |
|:---- |:------:| -----:|-----:|-----:|
| 2023-05-25 | 284 | 421 |... |578 |
| 2023-05-26 | 512 | 645 |... |114 |
| 2023-05-27 | 894 | 357 |... |765 |
我能够获取唯一的日期,但之后我一无所知。请查看我的示例代码:
import pandas as pd
train_data = pd.read_csv('date25to30.csv')
print(pd.to_datetime(train_data['time']).dt.date.unique())
希望这有助于您的工作。
英文:
I have a time series every which looks like this :
Time | Volume every minute |
---|---|
2023-05-25T00:00:00Z | 284 |
2023-05-25T00:01:00Z | 421 |
. | . |
. | . |
2023-05-27T23:58:00Z | 894 |
2023-05-27T23:59:00Z | 357 |
I have to make new CSV by iterating Time column finding unique date and making new columns with corresponding values of volume every minute. For example desired output:
Date | min1 | min2 | ... | min1440 |
---|---|---|---|---|
2023-05-25 | 284 | 421 | ... | 578 |
2023-05-26 | 512 | 645 | ... | 114 |
2023-05-27 | 894 | 357 | ... | 765 |
i am able to fetch unique dates but after that i am clueless. please find my sample codes:
import pandas as pd
train_data = pd.read_csv('date25to30.csv')
print(pd.to_datetime(train_data['time']).dt.date.unique())
答案1
得分: 0
首先,在 read_csv
函数中添加 parse_dates
参数,将 Time
列转换为日期时间:
train_data = pd.read_csv('date25to30.csv', parse_dates=['Time'])
然后通过 to_timedelta
和 Series.dt.total_seconds
将 HH:MM:SS
转换为时间增量,再除以 60
并加 1
,因为 Python 从 0
开始计数:
minutes = (pd.to_timedelta(train_data['Time'].dt.strftime('%H:%M:%S'))
.dt.total_seconds()
.div(60)
.astype(int)
.add(1))
最后使用 DataFrame.pivot_table
函数,并使用 DataFrame.add_prefix
给列名添加前缀:
df = (train_data.pivot_table(index=train_data['Time'].dt.date,
columns=minutes,
values='Volume',
aggfunc='sum').add_prefix('min'))
print(df)
Time min1 min2 min1439 min1440
Time
2023-05-25 284.0 421.0 NaN NaN
2023-05-27 NaN NaN 894.0 357.0
英文:
First add parameter parse_dates
to read_csv
for convert Time
column to datetimes:
train_data = pd.read_csv('date25to30.csv', parse_dates=['Time'])
Then create minutes by converting HH:MM:SS
to timedeltas by to_timedelta
and Series.dt.total_seconds
, divide 60
and add 1
because python count from 0
:
minutes = (pd.to_timedelta(train_data['Time'].dt.strftime('%H:%M:%S'))
.dt.total_seconds()
.div(60)
.astype(int)
.add(1))
Last pass to DataFrame.pivot_table
with DataFrame.add_prefix
:
df = (train_data.pivot_table(index=train_data['Time'].dt.date,
columns=minutes,
values='Volume',
aggfunc='sum').add_prefix('min'))
print (df)
Time min1 min2 min1439 min1440
Time
2023-05-25 284.0 421.0 NaN NaN
2023-05-27 NaN NaN 894.0 357.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论