如何合并列并更改时区?

huangapple go评论66阅读模式
英文:

How to combine columns and change timezone?

问题

import pandas as pd

# 读取CSV文件
df = pd.read_csv('ES.csv')

# 选择最后10行数据
df_small = df[-10:]

# 合并'DATE'和'TIME'列并创建新列'dateandtime'
df_small['dateandtime'] = df['DATE'].astype(str) + ' ' + df['TIME'].astype(str)

# 打印数据信息
df_small.info()

# 将合并后的数据保存到新CSV文件'new.csv'
df_small.to_csv('new.csv', index=False)

在这段代码中,我们读取了CSV文件,并选择了最后10行数据。然后,我们创建了一个新列'dateandtime',将'DATE'和'TIME'列合并到其中。最后,我们将处理后的数据保存到'new.csv'文件中(不包括索引列)。

英文:

如何合并列并更改时区?

I want to merge columns DATE and TIME into one "date&time" column and change timezone from Brussel to New York. I tried:

import pandas as pd


#df = pd.read_csv('ES.csv', parse_dates=True)
df = pd.read_csv('ES.csv')
df_small = df[-10:]
df_small['dateandtime'] = df['DATE'].astype(str) + df['TIME'].astype(str)
df_small.info()
df_small.to_frame(name='new')
df_small.to_frame(name='new').to_csv('new.csv')

Errors and warnings:

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

df_small.to_frame(name='new')
AttributeError: 'DataFrame' object has no attribute 'to_frame'

If you get it working you can have the full CSV file which I think is worth 500$-3000$ (10 years of financial 1 minute interval S&P500 futures data).

答案1

得分: 0

你可以使用 .cat 来合并列,.pop() 来移除旧列,最后使用 .to_localize() 转换为布鲁塞尔时区,然后使用 .tz_convert() 转换为纽约时区。将DatetimeIndex转换为字符串将比保留为datetime需要更多时间:

df_small["dateandtime"] = (pd
                           .to_datetime(df_small.pop("DATE").str.cat(df_small.pop("TIME"), sep=" "))
                           .dt.tz_localize("Europe/Brussels")
                           .dt.tz_convert("America/New_York")
                           )
df_small = df_small.set_index(pd.DatetimeIndex(df_small.pop("dateandtime")))
df_small.index = df_small.index.strftime("%Y-%m-%d %H:%M")

输出:

                    OPEN    HIGH     LOW   CLOSE  VOLUME
dateandtime                                             
2010-01-03 18:01 1113.75 1114.75 1113.25 1114.25    2185
2010-01-03 18:02 1114.25 1115.25 1114.25 1114.75     437
2010-01-03 18:03 1114.75 1114.75 1114.25  1114.5     596
2010-01-03 18:04 1114.25 1114.75 1114.25  1114.5     131
2010-01-03 18:05  1114.5  1114.5 1114.25 1114.25     166
英文:

You can use .cat to combine columns .pop() to remove old columns finally .to_localize() as Brussels tz and .tz_convert() to New York tz. Converting DatetimeIndex to str will take much more time than leaving it as datetime:

df_small["dateandtime"] = (pd
                           .to_datetime(df_small.pop("DATE").str.cat(df_small.pop("TIME"), sep=" "))
                           .dt.tz_localize("Europe/Brussels")
                           .dt.tz_convert("America/New_York")
                           )
df_small = df_small.set_index(pd.DatetimeIndex(df_small.pop("dateandtime")))
df_small.index = df_small.index.strftime("%Y-%m-%d %H:%M")

Output:

                    OPEN    HIGH     LOW   CLOSE  VOLUME
dateandtime                                             
2010-01-03 18:01 1113.75 1114.75 1113.25 1114.25    2185
2010-01-03 18:02 1114.25 1115.25 1114.25 1114.75     437
2010-01-03 18:03 1114.75 1114.75 1114.25  1114.5     596
2010-01-03 18:04 1114.25 1114.75 1114.25  1114.5     131
2010-01-03 18:05  1114.5  1114.5 1114.25 1114.25     166

huangapple
  • 本文由 发表于 2023年7月23日 22:10:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76748663.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定