英文:
pd.to_datetime converting mixed object feature column value to NAT values but intend to extract month
问题
I've been trying to process a column date in the Dataframe to obtain the month in int type using pd.to_datetime
.
This is the code in Python using pandas .
print(df["date"].head())
0 Oct
1 Jun
2 15-Oct
3 27-Nov
4 26-Sep
Name: date, dtype: object
After attempting to convert to datetime, I obtained all values in NAT. How do I fix this?
df["date"] = pd.to_datetime(df["date"], errors='coerce')
print(df["date"].head())
I get:
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
Name: date, dtype: datetime64[ns]
Running isna()
returns all values as NAT:
print(df["date"].isna().sum())
1000
I plan to obtain:
0 10
1 06
2 10
3 11
4 09
For values that can't be converted to datetime and then int (because values are missing or unrecognizable), I plan to replace with "Date not given".
What do I need to do?
英文:
I've been trying to process a column date in the Dataframe to obtain the month in int type using pd.to_datetime
.
This is the code in Python using pandas .
print(df["date"].head())
0 Oct
1 Jun
2 15-Oct
3 27-Nov
4 26-Sep
Name: date, dtype: object
After attempting to convert to datetime,I obtained all values in NAT. How do I fix this?
df["date"]=pd.to_datetime(df["date"],errors='coerce')
print(df["date"].head())
I get:
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
Name: date, dtype: datetime64[ns]
Running isNA returns all values at NAT:
print(df["date"].isna().sum())
1000
I plan to obtain:
0 10
1 06
2 10
3 11
4 09
For values that can't be converted to datetime and then int(because values are missing or unrecognisable) I plan to replace with "Date not given"
What do I need to do?
答案1
得分: 1
以下是您要翻译的内容:
使用 Series.str.extract
与 Series.map
:
d = {'Jan':'01', 'Feb':'02','Mar':'03', 'Apr':'04',
'May':'05','Jun':'06', 'Jul':'07','Aug':'08',
'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
df["date1"] = df["date"].str.extract(r'([A-Za-z]+)', expand=False).map(d)
或者将值转换为日期时间格式,使用 %b
匹配月份,然后通过 Series.dt.strftime
转换为字符串:
df["date2"] = pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False),
format='%b', errors='coerce').dt.strftime('%m')
print (df)
date date1 date2
0 Oct 10 10
1 Jun 06 06
2 15-Oct 10 10
3 27-Nov 11 11
4 26-Sep 09 09
如果需要整数:
df["date2"] = (pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False),
format='%b', errors='coerce')
.dt.month.astype('Int64'))
print (df)
date date2
0 Oct <NA>
1 Jun 6
2 15-Oct 10
3 27-Nov 11
4 26-Sep 9
英文:
Use Series.str.extract
with Series.map
:
d = {'Jan':'01', 'Feb':'02','Mar':'03', 'Apr':'04',
'May':'05','Jun':'06', 'Jul':'07','Aug':'08',
'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
df["date1"] = df["date"].str.extract(r'([A-Za-z]+)', expand=False).map(d)
Or convert values to datetimes with %b
for match months and convert to strings by Series.dt.strftime
:
df["date2"] = pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False),
format='%b', errors='coerce').dt.strftime('%m')
print (df)
date date1 date2
0 Oct 10 10
1 Jun 06 06
2 15-Oct 10 10
3 27-Nov 11 11
4 26-Sep 09 09
If need integers:
print (df)
date
0 Ocyt
1 Jun
2 15-Oct
3 27-Nov
4 26-Sep
df["date2"] = (pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False),
format='%b', errors='coerce')
.dt.month.astype('Int64'))
print (df)
date date2
0 Ocyt <NA>
1 Jun 6
2 15-Oct 10
3 27-Nov 11
4 26-Sep 9
答案2
得分: 1
你可以将你的列按 '-' 分割并保留最后一部分:
英文:
You can split your columns on '-' and keep the last part:
>>> pd.to_datetime(df['date'].str.split('-').str[-1], format='%b', errors='coerce').dt.month
0 10
1 6
2 10
3 11
4 9
Name: date, dtype: int32
If your locale is not English, you can use:
import locale
locale.setlocale(locale.LC_TIME, 'C')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论