Polars read_excel 将日期转换为字符串

huangapple go评论125阅读模式
英文:

Polars read_excel converting dates to strings

问题

我正在使用polars的read_excel函数,从Excel文件中读取一些日期。然而,当我读取它们时,它们以"mm-dd-yy"的格式转换为字符串。这会在后续的操作中引发问题,因为在Excel文件中我可能有一个日期为01/01/1950(它被转换为'01-01-50'),但当我使用polars引入的日期时,我的代码会认为这个日期是01/01/2050,因为我没有带入完整的年份。

您可以在下面的print()语句中看到,尽管我提取了从2050年和1950年的日期,但在polars中引入时,它们都显示为DF中的相同日期。那么,有没有办法带入完整的年份值以防止这种情况并区分实际的日期?

import polars as pl

extracted = pl.read_excel('file_name.xlsx')
print(extracted)

file_name.xlsx:

Polars read_excel 将日期转换为字符串

print(extracted) =

Polars read_excel 将日期转换为字符串

英文:

So I'm using the polars read_excel function and I'm reading in some dates from an excel file. However, when I read them in, they get formatted as a string with the format "mm-dd-yy". This is causing problems down the line as I'll have a date of 01/01/1950 in the excel file (which gets converted to '01-01-50'), but then when I go to use the date brought in by polars, my code thinks the date is 01/01/2050 instead since I don't have the full year being brought in.

You can see in the print() statement below that even though I'm pulling in dates from 2050 and 1950, when brought in with polars, they both appear to be the same date in the DF. So is there a way to bring in the full-year value to prevent this and distinguish between the actual dates?

Code:

import polars as pl
    
extracted = pl.read_excel('file_name.xlsx')
print(extracted)

file_name.xlsx:

Polars read_excel 将日期转换为字符串

print(extracted) =

Polars read_excel 将日期转换为字符串

答案1

得分: 5

xlsx2csv_option中使用dateformat指定一个格式为4位数字的年份:

extracted = pl.read_excel('testdate.xlsx', xlsx2csv_options={"dateformat": "%Y-%m-%d"})
print(extracted)
┌────────────┬─────────────┬─────────────┐
│ Hire Date  ┆ Hire Date 2 ┆ Hire Date 3 │
│ ---        ┆ ---         ┆ ---         │
│ str        ┆ str         ┆ str         │
╞════════════╪═════════════╪═════════════╡
│ 2005-02-05 ┆ 1950-01-02  ┆ 2050-01-02  │
│ 2005-02-05 ┆ 1950-01-03  ┆ 2050-01-03  │
│ 2020-04-06 ┆ 1950-01-04  ┆ 2050-01-04  │
│ 2008-12-20 ┆ 1950-01-05  ┆ 2050-01-05  │
│ 2009-03-12 ┆ 1950-01-06  ┆ 2050-01-06  │
│ 2018-05-26 ┆ 1950-01-07  ┆ 2050-01-07  │
│ 2018-05-26 ┆ 1950-01-08  ┆ 2050-01-08  │
└────────────┴─────────────┴─────────────┘
英文:

Specify a 4-digit year in the format, using the dateformat in xlsx2csv_option:

extracted = pl.read_excel('testdate.xlsx', xlsx2csv_options={"dateformat": "%Y-%m-%d"})
print(extracted)
┌────────────┬─────────────┬─────────────┐
│ Hire Date  ┆ Hire Date 2 ┆ Hire Date 3 │
│ ---        ┆ ---         ┆ ---         │
│ str        ┆ str         ┆ str         │
╞════════════╪═════════════╪═════════════╡
│ 2005-02-05 ┆ 1950-01-02  ┆ 2050-01-02  │
│ 2005-02-05 ┆ 1950-01-03  ┆ 2050-01-03  │
│ 2020-04-06 ┆ 1950-01-04  ┆ 2050-01-04  │
│ 2008-12-20 ┆ 1950-01-05  ┆ 2050-01-05  │
│ 2009-03-12 ┆ 1950-01-06  ┆ 2050-01-06  │
│ 2018-05-26 ┆ 1950-01-07  ┆ 2050-01-07  │
│ 2018-05-26 ┆ 1950-01-08  ┆ 2050-01-08  │
└────────────┴─────────────┴─────────────┘

huangapple
  • 本文由 发表于 2023年2月9日 01:09:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389306.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定