Pandas Python: KeyError 日期

huangapple go评论79阅读模式
英文:

Pandas Python: KeyError Date

问题

from datetime import datetime
import pandas as pd

data = pd.read_csv(r"F:\Sam\PJ\CSV2.csv")
data['Date'] = data['Date'].apply(lambda x: datetime.fromordinal(int(x)) + timedelta(days=x % 1) - timedelta(days=366))

print(data)
英文:

I am import into python where it will automatically create a date time object.

However I want the first column to be a datetime object in Python. Data looks like

Date,cost
41330.66667,100
41331.66667,101
41332.66667,102
41333.66667,103

Current code looks like:

from datetime import datetime
import pandas as pd

data = pd.read_csv(r"F:\Sam\PJ\CSV2.csv")
data['Date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))

print(data)

答案1

得分: 2

这看起来像是Excel的日期时间格式。这被称为序列日期。要从序列日期进行转换,您可以使用以下方法:

data['Date'].apply(lambda x: datetime.fromtimestamp((x - 25569) * 86400.0))

这将输出:

>>> data['Date'].apply(lambda x: datetime.fromtimestamp((x - 25569) * 86400.0))
0   2013-02-25 10:00:00.288
1   2013-02-26 10:00:00.288
2   2013-02-27 10:00:00.288
3   2013-02-28 10:00:00.288

要将其分配给data['Date'],您只需执行以下操作:

data['Date'] = data['Date'].apply(lambda x: datetime.fromtimestamp((x - 25569) * 86400.0))

#df
                     Date  cost
0 2013-02-25 16:00:00.288   100
1 2013-02-26 16:00:00.288   101
2 2013-02-27 16:00:00.288   102
3 2013-02-28 16:00:00.288   103
英文:

This looks like an excel datetime format. This is called a serial date. To convert from that serial date you can do this:

data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))

Which outputs:

>>> data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))
0   2013-02-25 10:00:00.288
1   2013-02-26 10:00:00.288
2   2013-02-27 10:00:00.288
3   2013-02-28 10:00:00.288

To assign it to data['Date'] you just do:

data['Date'] = data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))

#df
                     Date  cost
0 2013-02-25 16:00:00.288   100
1 2013-02-26 16:00:00.288   101
2 2013-02-27 16:00:00.288   102
3 2013-02-28 16:00:00.288   103

答案2

得分: 1

抱歉,read_csv 无法处理以数字表示的日期列。
但好消息是 Pandas 有一个适用的函数可以处理它。
read_csv 调用之后:

df.Date = pd.to_datetime(df.Date - 25569, unit='D').dt.round('ms')

据我了解,你的 Date 实际上是自 1899年12月30日 以来的天数(加上一天内的小数部分)。
上面的“校正因子”(25569)可以正常工作。对于 Date == 0,它会得到上面提到的 Excel纪元的开始日期

建议将结果四舍五入到毫秒(甚至秒)。
否则,由于小数部分的精确度不足而导致奇怪的效果。
例如,0.33333333 对应于8小时,可以计算为07:59:59.999712

英文:

Unfortunately, read_csv does not cope with date columns given as numbers.
But the good news is that Pandas does have a suitable function to do it.
After read_csv call:

df.Date = pd.to_datetime(df.Date - 25569, unit='D').dt.round('ms')

As I undestand, your Date is actually the number of days since 30.12.1899
(plus fractional part of the day).
The above "correction factor" (25569) works OK. For Date == 0 it gives
just the above start of Excel epoch date.

Rounding to miliseconds (or maybe even seconds) is advisable.
Otherwise you will get weird effects resulting from inaccurate rounding
of fractional parts of day.
E.g. 0.33333333 corresponding to 8 hours can be computed as
07:59:59.999712.

答案3

得分: 0

  1. 我们不知道CSV中有什么数据和列,但为了让pandas将日期识别为一列,它必须是CSV文件中的一列。

  2. Apply不会在原地操作。您需要将apply的结果分配回日期列,如下所示:
    data['Date'] = data['Date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))

英文:

Well you have two problems here.

  1. We don't know what data and columns the CSV has, but in order for pandas to pick up the date as a column, it must be a column on that csv file.

  2. Apply doesn't work in place. You would have to assign the result of apply back to date, as
    data['Date'] = data['Date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))

huangapple
  • 本文由 发表于 2020年1月7日 02:22:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/59617059.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定