pd.to_datetime将混合对象特征列值转换为NAT值,但意图提取月份。

huangapple go评论80阅读模式
英文:

pd.to_datetime converting mixed object feature column value to NAT values but intend to extract month

问题

I've been trying to process a column date in the Dataframe to obtain the month in int type using pd.to_datetime.

This is the code in Python using pandas .

print(df["date"].head())
0       Oct
1       Jun
2    15-Oct
3    27-Nov
4    26-Sep
Name: date, dtype: object

After attempting to convert to datetime, I obtained all values in NAT. How do I fix this?

df["date"] = pd.to_datetime(df["date"], errors='coerce')
print(df["date"].head())

I get:

0   NaT
1   NaT
2   NaT
3   NaT
4   NaT

Name: date, dtype: datetime64[ns]

Running isna() returns all values as NAT:

print(df["date"].isna().sum())
1000

I plan to obtain:

0       10
1       06
2       10
3       11
4       09

For values that can't be converted to datetime and then int (because values are missing or unrecognizable), I plan to replace with "Date not given".

What do I need to do?

英文:

I've been trying to process a column date in the Dataframe to obtain the month in int type using pd.to_datetime.

This is the code in Python using pandas .

print(df["date"].head())
0       Oct
1       Jun
2    15-Oct
3    27-Nov
4    26-Sep
Name: date, dtype: object

After attempting to convert to datetime,I obtained all values in NAT. How do I fix this?

df["date"]=pd.to_datetime(df["date"],errors='coerce')
print(df["date"].head())

I get:

0   NaT
1   NaT
2   NaT
3   NaT
4   NaT

Name: date, dtype: datetime64[ns]

Running isNA returns all values at NAT:

print(df["date"].isna().sum())
1000

I plan to obtain:

0       10
1       06
2       10
3       11
4       09

For values that can't be converted to datetime and then int(because values are missing or unrecognisable) I plan to replace with "Date not given"

What do I need to do?

答案1

得分: 1

以下是您要翻译的内容:

使用 Series.str.extractSeries.map

d = {'Jan':'01', 'Feb':'02','Mar':'03', 'Apr':'04', 
     'May':'05','Jun':'06', 'Jul':'07','Aug':'08',
     'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}

df["date1"] = df["date"].str.extract(r'([A-Za-z]+)', expand=False).map(d)

或者将值转换为日期时间格式,使用 %b 匹配月份,然后通过 Series.dt.strftime 转换为字符串:

df["date2"] = pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False), 
                             format='%b', errors='coerce').dt.strftime('%m')
print (df)
     date date1 date2
0     Oct    10    10
1     Jun    06    06
2  15-Oct    10    10
3  27-Nov    11    11
4  26-Sep    09    09

如果需要整数:

df["date2"] = (pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False), 
                              format='%b', errors='coerce')
                 .dt.month.astype('Int64'))
print (df)
     date date2
0     Oct  <NA>
1     Jun     6
2  15-Oct    10
3  27-Nov    11
4  26-Sep     9
英文:

Use Series.str.extract with Series.map:

d = {&#39;Jan&#39;:&#39;01&#39;, &#39;Feb&#39;:&#39;02&#39;,&#39;Mar&#39;:&#39;03&#39;, &#39;Apr&#39;:&#39;04&#39;, 
     &#39;May&#39;:&#39;05&#39;,&#39;Jun&#39;:&#39;06&#39;, &#39;Jul&#39;:&#39;07&#39;,&#39;Aug&#39;:&#39;08&#39;,
     &#39;Sep&#39;:&#39;09&#39;, &#39;Oct&#39;:&#39;10&#39;, &#39;Nov&#39;:&#39;11&#39;, &#39;Dec&#39;:&#39;12&#39;}

df[&quot;date1&quot;] = df[&quot;date&quot;].str.extract(r&#39;([A-Za-z]+)&#39;, expand=False).map(d)

Or convert values to datetimes with %b for match months and convert to strings by Series.dt.strftime:

df[&quot;date2&quot;] = pd.to_datetime(df[&quot;date&quot;].str.extract(r&#39;([A-Za-z]+)&#39;, expand=False), 
                             format=&#39;%b&#39;, errors=&#39;coerce&#39;).dt.strftime(&#39;%m&#39;)
print (df)
     date date1 date2
0     Oct    10    10
1     Jun    06    06
2  15-Oct    10    10
3  27-Nov    11    11
4  26-Sep    09    09

If need integers:

print (df)
     date
0    Ocyt
1     Jun
2  15-Oct
3  27-Nov
4  26-Sep


df[&quot;date2&quot;] = (pd.to_datetime(df[&quot;date&quot;].str.extract(r&#39;([A-Za-z]+)&#39;, expand=False), 
                              format=&#39;%b&#39;, errors=&#39;coerce&#39;)
                 .dt.month.astype(&#39;Int64&#39;))
print (df)
     date  date2
0    Ocyt   &lt;NA&gt;
1     Jun      6
2  15-Oct     10
3  27-Nov     11
4  26-Sep      9

答案2

得分: 1

你可以将你的列按 '-' 分割并保留最后一部分:

英文:

You can split your columns on '-' and keep the last part:

&gt;&gt;&gt; pd.to_datetime(df[&#39;date&#39;].str.split(&#39;-&#39;).str[-1], format=&#39;%b&#39;, errors=&#39;coerce&#39;).dt.month
0    10
1     6
2    10
3    11
4     9
Name: date, dtype: int32

If your locale is not English, you can use:

import locale

locale.setlocale(locale.LC_TIME, &#39;C&#39;)

huangapple
  • 本文由 发表于 2023年6月26日 14:07:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76553911.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定