英文:
Polars read_excel converting dates to strings
问题
我正在使用polars的read_excel函数,从Excel文件中读取一些日期。然而,当我读取它们时,它们以"mm-dd-yy"的格式转换为字符串。这会在后续的操作中引发问题,因为在Excel文件中我可能有一个日期为01/01/1950(它被转换为'01-01-50'),但当我使用polars引入的日期时,我的代码会认为这个日期是01/01/2050,因为我没有带入完整的年份。
您可以在下面的print()语句中看到,尽管我提取了从2050年和1950年的日期,但在polars中引入时,它们都显示为DF中的相同日期。那么,有没有办法带入完整的年份值以防止这种情况并区分实际的日期?
import polars as pl
extracted = pl.read_excel('file_name.xlsx')
print(extracted)
file_name.xlsx:
print(extracted) =
英文:
So I'm using the polars read_excel function and I'm reading in some dates from an excel file. However, when I read them in, they get formatted as a string with the format "mm-dd-yy". This is causing problems down the line as I'll have a date of 01/01/1950 in the excel file (which gets converted to '01-01-50'), but then when I go to use the date brought in by polars, my code thinks the date is 01/01/2050 instead since I don't have the full year being brought in.
You can see in the print() statement below that even though I'm pulling in dates from 2050 and 1950, when brought in with polars, they both appear to be the same date in the DF. So is there a way to bring in the full-year value to prevent this and distinguish between the actual dates?
Code:
import polars as pl
extracted = pl.read_excel('file_name.xlsx')
print(extracted)
file_name.xlsx:
print(extracted) =
答案1
得分: 5
在xlsx2csv_option
中使用dateformat
指定一个格式为4位数字的年份:
extracted = pl.read_excel('testdate.xlsx', xlsx2csv_options={"dateformat": "%Y-%m-%d"})
print(extracted)
┌────────────┬─────────────┬─────────────┐
│ Hire Date ┆ Hire Date 2 ┆ Hire Date 3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞════════════╪═════════════╪═════════════╡
│ 2005-02-05 ┆ 1950-01-02 ┆ 2050-01-02 │
│ 2005-02-05 ┆ 1950-01-03 ┆ 2050-01-03 │
│ 2020-04-06 ┆ 1950-01-04 ┆ 2050-01-04 │
│ 2008-12-20 ┆ 1950-01-05 ┆ 2050-01-05 │
│ 2009-03-12 ┆ 1950-01-06 ┆ 2050-01-06 │
│ 2018-05-26 ┆ 1950-01-07 ┆ 2050-01-07 │
│ 2018-05-26 ┆ 1950-01-08 ┆ 2050-01-08 │
└────────────┴─────────────┴─────────────┘
英文:
Specify a 4-digit year in the format, using the dateformat
in xlsx2csv_option
:
extracted = pl.read_excel('testdate.xlsx', xlsx2csv_options={"dateformat": "%Y-%m-%d"})
print(extracted)
┌────────────┬─────────────┬─────────────┐
│ Hire Date ┆ Hire Date 2 ┆ Hire Date 3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞════════════╪═════════════╪═════════════╡
│ 2005-02-05 ┆ 1950-01-02 ┆ 2050-01-02 │
│ 2005-02-05 ┆ 1950-01-03 ┆ 2050-01-03 │
│ 2020-04-06 ┆ 1950-01-04 ┆ 2050-01-04 │
│ 2008-12-20 ┆ 1950-01-05 ┆ 2050-01-05 │
│ 2009-03-12 ┆ 1950-01-06 ┆ 2050-01-06 │
│ 2018-05-26 ┆ 1950-01-07 ┆ 2050-01-07 │
│ 2018-05-26 ┆ 1950-01-08 ┆ 2050-01-08 │
└────────────┴─────────────┴─────────────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论