英文:
Pandas Python: KeyError Date
问题
from datetime import datetime
import pandas as pd
data = pd.read_csv(r"F:\Sam\PJ\CSV2.csv")
data['Date'] = data['Date'].apply(lambda x: datetime.fromordinal(int(x)) + timedelta(days=x % 1) - timedelta(days=366))
print(data)
英文:
I am import into python where it will automatically create a date time object.
However I want the first column to be a datetime object in Python. Data looks like
Date,cost
41330.66667,100
41331.66667,101
41332.66667,102
41333.66667,103
Current code looks like:
from datetime import datetime
import pandas as pd
data = pd.read_csv(r"F:\Sam\PJ\CSV2.csv")
data['Date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))
print(data)
答案1
得分: 2
这看起来像是Excel的日期时间格式。这被称为序列日期。要从序列日期进行转换,您可以使用以下方法:
data['Date'].apply(lambda x: datetime.fromtimestamp((x - 25569) * 86400.0))
这将输出:
>>> data['Date'].apply(lambda x: datetime.fromtimestamp((x - 25569) * 86400.0))
0 2013-02-25 10:00:00.288
1 2013-02-26 10:00:00.288
2 2013-02-27 10:00:00.288
3 2013-02-28 10:00:00.288
要将其分配给data['Date']
,您只需执行以下操作:
data['Date'] = data['Date'].apply(lambda x: datetime.fromtimestamp((x - 25569) * 86400.0))
#df
Date cost
0 2013-02-25 16:00:00.288 100
1 2013-02-26 16:00:00.288 101
2 2013-02-27 16:00:00.288 102
3 2013-02-28 16:00:00.288 103
英文:
This looks like an excel datetime format. This is called a serial date. To convert from that serial date you can do this:
data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))
Which outputs:
>>> data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))
0 2013-02-25 10:00:00.288
1 2013-02-26 10:00:00.288
2 2013-02-27 10:00:00.288
3 2013-02-28 10:00:00.288
To assign it to data['Date']
you just do:
data['Date'] = data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))
#df
Date cost
0 2013-02-25 16:00:00.288 100
1 2013-02-26 16:00:00.288 101
2 2013-02-27 16:00:00.288 102
3 2013-02-28 16:00:00.288 103
答案2
得分: 1
抱歉,read_csv 无法处理以数字表示的日期列。
但好消息是 Pandas 有一个适用的函数可以处理它。
在 read_csv 调用之后:
df.Date = pd.to_datetime(df.Date - 25569, unit='D').dt.round('ms')
据我了解,你的 Date 实际上是自 1899年12月30日 以来的天数(加上一天内的小数部分)。
上面的“校正因子”(25569)可以正常工作。对于 Date == 0,它会得到上面提到的 Excel纪元的开始日期。
建议将结果四舍五入到毫秒(甚至秒)。
否则,由于小数部分的精确度不足而导致奇怪的效果。
例如,0.33333333 对应于8小时,可以计算为07:59:59.999712。
英文:
Unfortunately, read_csv does not cope with date columns given as numbers.
But the good news is that Pandas does have a suitable function to do it.
After read_csv call:
df.Date = pd.to_datetime(df.Date - 25569, unit='D').dt.round('ms')
As I undestand, your Date is actually the number of days since 30.12.1899
(plus fractional part of the day).
The above "correction factor" (25569) works OK. For Date == 0 it gives
just the above start of Excel epoch date.
Rounding to miliseconds (or maybe even seconds) is advisable.
Otherwise you will get weird effects resulting from inaccurate rounding
of fractional parts of day.
E.g. 0.33333333 corresponding to 8 hours can be computed as
07:59:59.999712.
答案3
得分: 0
-
我们不知道CSV中有什么数据和列,但为了让pandas将日期识别为一列,它必须是CSV文件中的一列。
-
Apply不会在原地操作。您需要将apply的结果分配回日期列,如下所示:
data['Date'] = data['Date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))
英文:
Well you have two problems here.
-
We don't know what data and columns the CSV has, but in order for pandas to pick up the date as a column, it must be a column on that csv file.
-
Apply doesn't work in place. You would have to assign the result of apply back to date, as
data['Date'] = data['Date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论