Date conversion error in DataFrame in pandas, can anyone point why this issue is happening and how to fix it

huangapple go评论100阅读模式
英文:

Date conversion error in DataFrame in pandas, can anyone point why this issue is happening and how to fix it

问题

我正在尝试转换我的数据框中的两个日期列。但是,其中一些日期是使用"%d/%m/%Y"转换的,而另一些日期是使用"%m/%d/%Y"转换的。问题仅发生在从2023年5月1日到2023年5月12日之间。从5月13日开始,又开始使用"%d/%m/%Y"。

我正在使用以下代码来转换我的数据:

  1. columns_to_convert_to_date = ['Date', 'Value Dt']
  2. regex_pattern = r'\d{2}/\d{2}/\d{4}'
  3. for column in columns_to_convert_to_date:
  4. full_bank_df_hdfc[column] = full_bank_df_hdfc[column].apply(lambda x: pd.to_datetime(x, format='%d/%m/%Y', errors='coerce') if re.match(regex_pattern, str(x)) else pd.to_datetime(x, errors='coerce'))

数据转换前的截图:
点击此处查看图片

数据转换后的截图:
点击此处查看图片

我尝试强制指定日期格式,但没有成功。只使用以下代码时:

  1. pd.to_datetime(full_bank_df_hdfc['Date'], format='%d/%m/%y', errors='coerce')

在列中得到了所需的输出,但导致我已经存在的日期时间格式行变成了NaT。

1043 2023-04-30
1044 2023-05-01
1045 2023-05-01
1046 2023-05-01
1047 2023-05-02
1048 2023-05-02
1049 2023-05-03
1050 2023-05-03
1051 2023-05-03
1052 2023-05-04
1053 2023-05-04
1054 2023-05-06
1055 2023-05-06
1056 2023-05-07
1057 2023-05-08

英文:

i am trying to convert my two date columns in my dataframe. however some of the date is converted using "%d/%m/%Y" and a few of the data is getting converted using "%m/%d/%Y" . the issue is happening only from 01-May-2023 to 12-May-2023. from 13 may it is again reverting to using "%d/%m/%Y".

I am using the following to convert my data:

  1. columns_to_convert_to_date= ['Date','Value Dt']
  2. regex_pattern = r'\d{2}/\d{2}/\d{4}'
  3. for column in columns_to_convert_to_date:
  4. full_bank_df_hdfc[column] = full_bank_df_hdfc[column].apply(lambda x: pd.to_datetime(x, format='%d/%m/%Y', errors='coerce') if re.match(regex_pattern, str(x)) else pd.to_datetime(x, errors='coerce'))

screenshot
data before transformation:
enter image description here

data after transformation:
enter image description here

i have tried to force it to use the format to no avail. when using only

pd.to_datetime(full_bank_df_hdfc['Date'],format='%d/%m/%y', errors='coerce')

am i getting the desired output in the column , however that is resulting in my already existing datetime format roes to be NaT

1043 2023-04-30
1044 2023-05-01
1045 2023-05-01
1046 2023-05-01
1047 2023-05-02
1048 2023-05-02
1049 2023-05-03
1050 2023-05-03
1051 2023-05-03
1052 2023-05-04
1053 2023-05-04
1054 2023-05-06
1055 2023-05-06
1056 2023-05-07
1057 2023-05-08

答案1

得分: 0

你尝试匹配年份部分的4位数字,但你的截图只显示年份的2位数字?

尝试:

  1. columns_to_convert_to_date= ['Date','Value Dt']
  2. regex_pattern = r'(\d{2})/(\d{2})/(\d{2})'
  3. for column in columns_to_convert_to_date:
  4. dt = df[column].str.replace(regex_pattern, r'20-- 00:00:00', regex=True)
  5. df[column] = pd.to_datetime(dt)

输出:

  1. # 在处理前
  2. >>> df
  3. Date Value Dt
  4. 0 2022-04-01 00:00:00 2022-04-01 00:00:00
  5. 1 2022-04-01 00:00:00 2022-04-01 00:00:00
  6. 2 2022-04-02 00:00:00 2022-04-02 00:00:00
  7. 3 2022-04-02 00:00:00 2022-04-02 00:00:00
  8. 4 2022-04-02 00:00:00 2022-04-02 00:00:00
  9. 1084 24/05/23 24/05/23
  10. 1085 24/05/23 24/05/23
  11. # 处理后
  12. >>> df
  13. Date Value Dt
  14. 0 2022-04-01 2022-04-01
  15. 1 2022-04-01 2022-04-01
  16. 2 2022-04-02 2022-04-02
  17. 3 2022-04-02 2022-04-02
  18. 4 2022-04-02 2022-04-02
  19. 1084 2023-05-24 2023-05-24
  20. 1085 2023-05-24 2023-05-24
英文:

You try to match the year part on 4 digits but your screenshot show only 2 digits for year?

Try:

  1. columns_to_convert_to_date= ['Date','Value Dt']
  2. regex_pattern = r'(\d{2})/(\d{2})/(\d{2})'
  3. for column in columns_to_convert_to_date:
  4. dt = df[column].str.replace(regex_pattern, r'20\3-\2-\1 00:00:00', regex=True)
  5. df[column] = pd.to_datetime(dt)

Output:

  1. # Before
  2. >>> df
  3. Date Value Dt
  4. 0 2022-04-01 00:00:00 2022-04-01 00:00:00
  5. 1 2022-04-01 00:00:00 2022-04-01 00:00:00
  6. 2 2022-04-02 00:00:00 2022-04-02 00:00:00
  7. 3 2022-04-02 00:00:00 2022-04-02 00:00:00
  8. 4 2022-04-02 00:00:00 2022-04-02 00:00:00
  9. 1084 24/05/23 24/05/23
  10. 1085 24/05/23 24/05/23
  11. # After
  12. >>> df
  13. Date Value Dt
  14. 0 2022-04-01 2022-04-01
  15. 1 2022-04-01 2022-04-01
  16. 2 2022-04-02 2022-04-02
  17. 3 2022-04-02 2022-04-02
  18. 4 2022-04-02 2022-04-02
  19. 1084 2023-05-24 2023-05-24
  20. 1085 2023-05-24 2023-05-24

huangapple
  • 本文由 发表于 2023年6月16日 04:49:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76485419.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定