想要检测日期列并将它们转换成datetime64数据类型格式。

huangapple go评论99阅读模式
英文:

Want to detect date columns and convert them into datetime64 dtype format

问题

Here's the translated code to achieve your goal:

  1. import pandas as pd
  2. # 创建数据帧
  3. df = pd.DataFrame({
  4. 'Date1': ['20200422', '20200422.0', '20200501', '0'],
  5. 'Date2': ['15/05/2023', '0', '20230520', '25/06/2023'],
  6. 'Other1': ['Text1', 'Text2', 'Text3', 'Text4'],
  7. 'Other2': [10, 20, 30, 0]
  8. })
  9. # 导入模块
  10. import numpy as np
  11. # 将Date1和Date2列转换为datetime64,并设置日期格式为dd/mm/yyyy
  12. df['Date1'] = pd.to_datetime(df['Date1'], format='%d%m%Y', errors='coerce')
  13. df['Date2'] = pd.to_datetime(df['Date2'], format='%d/%m/%Y', errors='coerce')
  14. # 将0转换为NaN
  15. df['Date1'].replace(0, np.nan, inplace=True)
  16. df['Date2'].replace(0, np.nan, inplace=True)
  17. # 打印结果
  18. print(df)

This code will create a DataFrame, convert the 'Date1' and 'Date2' columns to datetime64 with the specified date format, and replace 0s with NaN in those columns.

英文:

I have a dataframe

  1. df = pd.DataFrame({
  2. 'Date1': ['20200422', '20200422.0', '20200501', '0'],
  3. 'Date2': ['15/05/2023', '0', '20230520', '25/06/2023'],
  4. 'Other1': ['Text1', 'Text2', 'Text3', 'Text4'],
  5. 'Other2': [10, 20, 30, 0]
  6. })

I want code to detect only Date1 and Date2 and convert these two column dtype to datetime64 and their format to dd/mm/yyyy. It can convert those 0s in Date1 and Date2 to NaN.

答案1

得分: 1

If want convert Date1 and Date2 columns to datetimes use to_datetime with Series.replace:

  1. df['Date1'] = pd.to_datetime(df['Date1'].replace('\.0',' ',regex=True), errors='coerce')
  2. df['Date2'] = pd.to_datetime(df['Date2'], errors='coerce', dayfirst=True)
  3. print (df)
  4. Date1 Date2 Other1 Other2
  5. 0 2020-04-22 2023-05-15 Text1 10
  6. 1 2020-04-22 NaT Text2 20
  7. 2 2020-05-01 2023-05-20 Text3 30
  8. 3 NaT 2023-06-25 Text4 0

If there are 2 formats in Date2 is possible use Series.fillna:

  1. df['Date1'] = pd.to_datetime(df['Date1'].str.replace('\.0',' ',regex=True), errors='coerce')
  2. d1 = pd.to_datetime(df['Date2'], errors='coerce', format='%d/%m/%Y')
  3. d2 = pd.to_datetime(df['Date2'], errors='coerce', format='%Y%m%d')
  4. df['Date2'] = d1.fillna(d2)
  5. print (df)
  6. Date1 Date2 Other1 Other2
  7. 0 2020-04-22 2023-05-15 Text1 10
  8. 1 2020-04-22 NaT Text2 20
  9. 2 2020-05-01 2023-05-20 Text3 30
  10. 3 NaT 2023-06-25 Text4 0
英文:

If want convert Date1 and Date2 columns to datetimes use to_datetime with Series.replace:

  1. df['Date1'] = pd.to_datetime(df['Date1'].replace('\.0','',regex=True), errors='coerce')
  2. df['Date2'] = pd.to_datetime(df['Date2'], errors='coerce', dayfirst=True)
  3. print (df)
  4. Date1 Date2 Other1 Other2
  5. 0 2020-04-22 2023-05-15 Text1 10
  6. 1 2020-04-22 NaT Text2 20
  7. 2 2020-05-01 2023-05-20 Text3 30
  8. 3 NaT 2023-06-25 Text4 0

If there are 2 formats in Date2 is possible use Series.fillna:

  1. df['Date1'] = pd.to_datetime(df['Date1'].str.replace('\.0','',regex=True), errors='coerce')
  2. d1 = pd.to_datetime(df['Date2'], errors='coerce', format='%d/%m/%Y')
  3. d2 = pd.to_datetime(df['Date2'], errors='coerce', format='%Y%m%d')
  4. df['Date2'] = d1.fillna(d2)
  5. print (df)
  6. Date1 Date2 Other1 Other2
  7. 0 2020-04-22 2023-05-15 Text1 10
  8. 1 2020-04-22 NaT Text2 20
  9. 2 2020-05-01 2023-05-20 Text3 30
  10. 3 NaT 2023-06-25 Text4 0

答案2

得分: 0

输出:

  1. Date1 Date2 Other1 Other2
  2. 0 2020-04-22 2023-05-15 Text1 10
  3. 1 2020-04-22 NaT Text2 20
  4. 2 2020-05-01 2023-05-20 Text3 30
  5. 3 NaT 2023-06-25 Text4 0

数据类型:

  1. Date1 datetime64[ns]
  2. Date2 datetime64[ns]
  3. Other1 object
  4. Other2 int64
  5. dtype: object
英文:

If your goal is to detect the dates columns automatically, you can use a custom function:

  1. from functools import reduce
  2. def date_detect(s, formats=None):
  3. try:
  4. s1 = s.str.replace(r'\.0*', '', regex=True)
  5. except AttributeError:
  6. s1 = s
  7. if formats is None:
  8. s2 = pd.to_datetime(s1, errors='coerce')
  9. else:
  10. s2 = reduce(lambda a,b: a.fillna(b),
  11. [pd.to_datetime(s1, format=f, errors='coerce')
  12. for f in formats])
  13. if s2.notna().any():
  14. return s2
  15. return s
  16. out = df.apply(date_detect, formats=['%d/%m/%Y', '%Y%m%d'])

Output:

  1. Date1 Date2 Other1 Other2
  2. 0 2020-04-22 2023-05-15 Text1 10
  3. 1 2020-04-22 NaT Text2 20
  4. 2 2020-05-01 2023-05-20 Text3 30
  5. 3 NaT 2023-06-25 Text4 0

dtypes:

  1. Date1 datetime64[ns]
  2. Date2 datetime64[ns]
  3. Other1 object
  4. Other2 int64
  5. dtype: object

huangapple
  • 本文由 发表于 2023年6月8日 14:52:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429286.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定