如何将数据类型字符串表示为 PolarsDataType

huangapple go评论59阅读模式
英文:

How to represent a data type string as PolarsDataType

问题

根据文档和使用read_csv()时的示例,我们只能在dtypes映射中使用PolarsDataTypes作为值:

dtypes: Mapping[str, PolarsDataType] | Sequence[PolarsDataType] | None = None,

我有一个JSON配置,其中包含列及其数据类型的映射,但以字符串的形式表示,如下所示:

   "columns_dtypes_polars": {
        "pcd": "pl.Utf8",
        "streg": "pl.Int64",
        "oac11": "pl.Utf8",
        "lat": "pl.Float64",
        "long": "pl.Float64",
        "imd": "pl.Int64"
    }

当我尝试在Python中读取后使用这些值时,Polars的数据类型仍然是字符串,导致错误。我不能在JSON中使用原始值,因为那会引发错误。我有很多字段,因此需要应用dtypes参数。

所以我的主要问题是如何将字符串表示"pl.Int64"转换为原始的PolarsDataType表示pl.Int64,以便我可以在read_csv()dtype参数中使用它?

英文:

According to the documentation and examples when using read_csv(), we can only use PolarsDataTypes as the values in the map for dtypes:

dtypes: Mapping[str, PolarsDataType] | Sequence[PolarsDataType] | None = None,

I have a JSON config where I have a map of the columns and their datatypes but as strings like so:

   "columns_dtypes_polars": {
        "pcd": "pl.Utf8",
        "streg": "pl.Int64",
        "oac11": "pl.Utf8",
        "lat": "pl.Float64",
        "long": "pl.Float64",
        "imd": "pl.Int64"
    }

When I try to use this after reading into python, the values for PolarsDataTypes are still strings and Polars throws an error. I can't have the raw values in JSON as that would throw an error. I have a ton of fields, so I do need to apply the dtypes parameter.

So my main question is how do I convert the string representation "pl.Int64" to raw PolarsDataType representation pl.Int64 so I can use it in the read_csv() dtype parameter?

答案1

得分: 2

我不知道是否有任何 polars 方法可以实现这一点。我的解决方案利用了 getattr 内置函数来从模块对象中获取属性。

import polars as pl

def convert_string_to_polars_dtype(
    mapping: dict[str, str]
) -> dict[str, pl.PolarsDataType]:
    return {key: getattr(pl, value.split(".")[1]) for key, value in mapping.items()}

columns_dtypes_polars = {
    "pcd": "pl.Utf8",
    "streg": "pl.Int64",
    "oac11": "pl.Utf8",
    "lat": "pl.Float64",
    "long": "pl.Float64",
    "imd": "pl.Int64",
}

print(convert_string_to_polars_dtype(columns_dtypes_polars))

输出结果应该是:

{'pcd': Utf8, 'streg': Int64, 'oac11': Utf8, 'lat': Float64, 'long': Float64, 'imd': Int64}
英文:

IDK if there is any polars way to achieve this. My solution make use of getattr builtin function to fetch the attribute from module object.

>>> import polars as pl
>>> 
>>> 
>>> def convert_string_to_polars_dtype(
...     mapping: dict[str, str]
... ) -> dict[str, pl.PolarsDataType]:
...     return {key: getattr(pl, value.split(".")[1]) for key, value in mapping.items()}
... 
>>> 
>>> columns_dtypes_polars = {
...     "pcd": "pl.Utf8",
...     "streg": "pl.Int64",
...     "oac11": "pl.Utf8",
...     "lat": "pl.Float64",
...     "long": "pl.Float64",
...     "imd": "pl.Int64",
... }
>>> 
>>> print(convert_string_to_polars_dtype(columns_dtypes_polars))
{'pcd': Utf8, 'streg': Int64, 'oac11': Utf8, 'lat': Float64, 'long': Float64, 'imd': Int64}

huangapple
  • 本文由 发表于 2023年7月10日 22:15:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654631.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定