想要检测日期列并将它们转换成datetime64数据类型格式。

huangapple go评论57阅读模式
英文:

Want to detect date columns and convert them into datetime64 dtype format

问题

Here's the translated code to achieve your goal:

import pandas as pd

# 创建数据帧
df = pd.DataFrame({
    'Date1': ['20200422', '20200422.0', '20200501', '0'],
    'Date2': ['15/05/2023', '0', '20230520', '25/06/2023'],
    'Other1': ['Text1', 'Text2', 'Text3', 'Text4'],
    'Other2': [10, 20, 30, 0]
})

# 导入模块
import numpy as np

# 将Date1和Date2列转换为datetime64,并设置日期格式为dd/mm/yyyy
df['Date1'] = pd.to_datetime(df['Date1'], format='%d%m%Y', errors='coerce')
df['Date2'] = pd.to_datetime(df['Date2'], format='%d/%m/%Y', errors='coerce')

# 将0转换为NaN
df['Date1'].replace(0, np.nan, inplace=True)
df['Date2'].replace(0, np.nan, inplace=True)

# 打印结果
print(df)

This code will create a DataFrame, convert the 'Date1' and 'Date2' columns to datetime64 with the specified date format, and replace 0s with NaN in those columns.

英文:

I have a dataframe

df = pd.DataFrame({
    'Date1': ['20200422', '20200422.0', '20200501', '0'],
    'Date2': ['15/05/2023', '0', '20230520', '25/06/2023'],
    'Other1': ['Text1', 'Text2', 'Text3', 'Text4'],
    'Other2': [10, 20, 30, 0]
})

I want code to detect only Date1 and Date2 and convert these two column dtype to datetime64 and their format to dd/mm/yyyy. It can convert those 0s in Date1 and Date2 to NaN.

答案1

得分: 1

If want convert Date1 and Date2 columns to datetimes use to_datetime with Series.replace:

df['Date1'] = pd.to_datetime(df['Date1'].replace('\.0',' ',regex=True), errors='coerce')
df['Date2'] = pd.to_datetime(df['Date2'], errors='coerce', dayfirst=True)
print (df)
       Date1      Date2 Other1  Other2
0 2020-04-22 2023-05-15  Text1      10
1 2020-04-22        NaT  Text2      20
2 2020-05-01 2023-05-20  Text3      30
3        NaT 2023-06-25  Text4       0

If there are 2 formats in Date2 is possible use Series.fillna:

df['Date1'] = pd.to_datetime(df['Date1'].str.replace('\.0',' ',regex=True), errors='coerce')

d1 = pd.to_datetime(df['Date2'], errors='coerce', format='%d/%m/%Y')
d2 = pd.to_datetime(df['Date2'], errors='coerce', format='%Y%m%d')
df['Date2'] = d1.fillna(d2)
print (df)
       Date1      Date2 Other1  Other2
0 2020-04-22 2023-05-15  Text1      10
1 2020-04-22        NaT  Text2      20
2 2020-05-01 2023-05-20  Text3      30
3        NaT 2023-06-25  Text4       0
英文:

If want convert Date1 and Date2 columns to datetimes use to_datetime with Series.replace:

df['Date1'] = pd.to_datetime(df['Date1'].replace('\.0','',regex=True), errors='coerce')
df['Date2'] = pd.to_datetime(df['Date2'], errors='coerce', dayfirst=True)
print (df)
       Date1      Date2 Other1  Other2
0 2020-04-22 2023-05-15  Text1      10
1 2020-04-22        NaT  Text2      20
2 2020-05-01 2023-05-20  Text3      30
3        NaT 2023-06-25  Text4       0

If there are 2 formats in Date2 is possible use Series.fillna:

df['Date1'] = pd.to_datetime(df['Date1'].str.replace('\.0','',regex=True), errors='coerce')

d1 = pd.to_datetime(df['Date2'], errors='coerce', format='%d/%m/%Y')
d2 = pd.to_datetime(df['Date2'], errors='coerce', format='%Y%m%d')
df['Date2'] = d1.fillna(d2)
print (df)
       Date1      Date2 Other1  Other2
0 2020-04-22 2023-05-15  Text1      10
1 2020-04-22        NaT  Text2      20
2 2020-05-01 2023-05-20  Text3      30
3        NaT 2023-06-25  Text4       0

答案2

得分: 0

输出:

       Date1      Date2 Other1  Other2
0 2020-04-22 2023-05-15  Text1      10
1 2020-04-22        NaT  Text2      20
2 2020-05-01 2023-05-20  Text3      30
3        NaT 2023-06-25  Text4       0

数据类型:

Date1     datetime64[ns]
Date2     datetime64[ns]
Other1            object
Other2             int64
dtype: object
英文:

If your goal is to detect the dates columns automatically, you can use a custom function:

from functools import reduce

def date_detect(s, formats=None):
    try:
        s1 = s.str.replace(r'\.0*', '', regex=True)
    except AttributeError:
        s1 = s
    if formats is None:
        s2 = pd.to_datetime(s1, errors='coerce')
    else:
        s2 = reduce(lambda a,b: a.fillna(b), 
                    [pd.to_datetime(s1, format=f, errors='coerce')
                     for f in formats])
    if s2.notna().any():
        return s2
    return s

out = df.apply(date_detect, formats=['%d/%m/%Y', '%Y%m%d'])

Output:

       Date1      Date2 Other1  Other2
0 2020-04-22 2023-05-15  Text1      10
1 2020-04-22        NaT  Text2      20
2 2020-05-01 2023-05-20  Text3      30
3        NaT 2023-06-25  Text4       0

dtypes:

Date1     datetime64[ns]
Date2     datetime64[ns]
Other1            object
Other2             int64
dtype: object

huangapple
  • 本文由 发表于 2023年6月8日 14:52:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429286.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定