pd.to_datetime将混合对象特征列值转换为NAT值,但意图提取月份。

huangapple go评论105阅读模式
英文:

pd.to_datetime converting mixed object feature column value to NAT values but intend to extract month

问题

I've been trying to process a column date in the Dataframe to obtain the month in int type using pd.to_datetime.

This is the code in Python using pandas .

  1. print(df["date"].head())
  1. 0 Oct
  2. 1 Jun
  3. 2 15-Oct
  4. 3 27-Nov
  5. 4 26-Sep
  6. Name: date, dtype: object

After attempting to convert to datetime, I obtained all values in NAT. How do I fix this?

  1. df["date"] = pd.to_datetime(df["date"], errors='coerce')
  2. print(df["date"].head())

I get:

  1. 0 NaT
  2. 1 NaT
  3. 2 NaT
  4. 3 NaT
  5. 4 NaT
  6. Name: date, dtype: datetime64[ns]

Running isna() returns all values as NAT:

  1. print(df["date"].isna().sum())
  2. 1000

I plan to obtain:

  1. 0 10
  2. 1 06
  3. 2 10
  4. 3 11
  5. 4 09

For values that can't be converted to datetime and then int (because values are missing or unrecognizable), I plan to replace with "Date not given".

What do I need to do?

英文:

I've been trying to process a column date in the Dataframe to obtain the month in int type using pd.to_datetime.

This is the code in Python using pandas .

  1. print(df["date"].head())
  1. 0 Oct
  2. 1 Jun
  3. 2 15-Oct
  4. 3 27-Nov
  5. 4 26-Sep
  6. Name: date, dtype: object

After attempting to convert to datetime,I obtained all values in NAT. How do I fix this?

  1. df["date"]=pd.to_datetime(df["date"],errors='coerce')
  2. print(df["date"].head())

I get:

  1. 0 NaT
  2. 1 NaT
  3. 2 NaT
  4. 3 NaT
  5. 4 NaT
  6. Name: date, dtype: datetime64[ns]

Running isNA returns all values at NAT:

  1. print(df["date"].isna().sum())
  2. 1000

I plan to obtain:

  1. 0 10
  2. 1 06
  3. 2 10
  4. 3 11
  5. 4 09

For values that can't be converted to datetime and then int(because values are missing or unrecognisable) I plan to replace with "Date not given"

What do I need to do?

答案1

得分: 1

以下是您要翻译的内容:

使用 Series.str.extractSeries.map

  1. d = {'Jan':'01', 'Feb':'02','Mar':'03', 'Apr':'04',
  2. 'May':'05','Jun':'06', 'Jul':'07','Aug':'08',
  3. 'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
  4. df["date1"] = df["date"].str.extract(r'([A-Za-z]+)', expand=False).map(d)

或者将值转换为日期时间格式,使用 %b 匹配月份,然后通过 Series.dt.strftime 转换为字符串:

  1. df["date2"] = pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False),
  2. format='%b', errors='coerce').dt.strftime('%m')
  3. print (df)
  4. date date1 date2
  5. 0 Oct 10 10
  6. 1 Jun 06 06
  7. 2 15-Oct 10 10
  8. 3 27-Nov 11 11
  9. 4 26-Sep 09 09

如果需要整数:

  1. df["date2"] = (pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False),
  2. format='%b', errors='coerce')
  3. .dt.month.astype('Int64'))
  4. print (df)
  5. date date2
  6. 0 Oct <NA>
  7. 1 Jun 6
  8. 2 15-Oct 10
  9. 3 27-Nov 11
  10. 4 26-Sep 9
英文:

Use Series.str.extract with Series.map:

  1. d = {&#39;Jan&#39;:&#39;01&#39;, &#39;Feb&#39;:&#39;02&#39;,&#39;Mar&#39;:&#39;03&#39;, &#39;Apr&#39;:&#39;04&#39;,
  2. &#39;May&#39;:&#39;05&#39;,&#39;Jun&#39;:&#39;06&#39;, &#39;Jul&#39;:&#39;07&#39;,&#39;Aug&#39;:&#39;08&#39;,
  3. &#39;Sep&#39;:&#39;09&#39;, &#39;Oct&#39;:&#39;10&#39;, &#39;Nov&#39;:&#39;11&#39;, &#39;Dec&#39;:&#39;12&#39;}
  4. df[&quot;date1&quot;] = df[&quot;date&quot;].str.extract(r&#39;([A-Za-z]+)&#39;, expand=False).map(d)

Or convert values to datetimes with %b for match months and convert to strings by Series.dt.strftime:

  1. df[&quot;date2&quot;] = pd.to_datetime(df[&quot;date&quot;].str.extract(r&#39;([A-Za-z]+)&#39;, expand=False),
  2. format=&#39;%b&#39;, errors=&#39;coerce&#39;).dt.strftime(&#39;%m&#39;)
  3. print (df)
  4. date date1 date2
  5. 0 Oct 10 10
  6. 1 Jun 06 06
  7. 2 15-Oct 10 10
  8. 3 27-Nov 11 11
  9. 4 26-Sep 09 09

If need integers:

  1. print (df)
  2. date
  3. 0 Ocyt
  4. 1 Jun
  5. 2 15-Oct
  6. 3 27-Nov
  7. 4 26-Sep
  8. df[&quot;date2&quot;] = (pd.to_datetime(df[&quot;date&quot;].str.extract(r&#39;([A-Za-z]+)&#39;, expand=False),
  9. format=&#39;%b&#39;, errors=&#39;coerce&#39;)
  10. .dt.month.astype(&#39;Int64&#39;))
  11. print (df)
  12. date date2
  13. 0 Ocyt &lt;NA&gt;
  14. 1 Jun 6
  15. 2 15-Oct 10
  16. 3 27-Nov 11
  17. 4 26-Sep 9

答案2

得分: 1

你可以将你的列按 '-' 分割并保留最后一部分:

英文:

You can split your columns on '-' and keep the last part:

  1. &gt;&gt;&gt; pd.to_datetime(df[&#39;date&#39;].str.split(&#39;-&#39;).str[-1], format=&#39;%b&#39;, errors=&#39;coerce&#39;).dt.month
  2. 0 10
  3. 1 6
  4. 2 10
  5. 3 11
  6. 4 9
  7. Name: date, dtype: int32

If your locale is not English, you can use:

  1. import locale
  2. locale.setlocale(locale.LC_TIME, &#39;C&#39;)

huangapple
  • 本文由 发表于 2023年6月26日 14:07:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76553911.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定