英文:
How to represent a data type string as PolarsDataType
问题
根据文档和使用read_csv()
时的示例,我们只能在dtypes
映射中使用PolarsDataTypes
作为值:
dtypes: Mapping[str, PolarsDataType] | Sequence[PolarsDataType] | None = None,
我有一个JSON配置,其中包含列及其数据类型的映射,但以字符串的形式表示,如下所示:
"columns_dtypes_polars": {
"pcd": "pl.Utf8",
"streg": "pl.Int64",
"oac11": "pl.Utf8",
"lat": "pl.Float64",
"long": "pl.Float64",
"imd": "pl.Int64"
}
当我尝试在Python中读取后使用这些值时,Polars的数据类型仍然是字符串,导致错误。我不能在JSON中使用原始值,因为那会引发错误。我有很多字段,因此需要应用dtypes
参数。
所以我的主要问题是如何将字符串表示"pl.Int64"
转换为原始的PolarsDataType表示pl.Int64
,以便我可以在read_csv()
的dtype
参数中使用它?
英文:
According to the documentation and examples when using read_csv()
, we can only use PolarsDataTypes
as the values in the map for dtypes:
dtypes: Mapping[str, PolarsDataType] | Sequence[PolarsDataType] | None = None,
I have a JSON config where I have a map of the columns and their datatypes but as strings like so:
"columns_dtypes_polars": {
"pcd": "pl.Utf8",
"streg": "pl.Int64",
"oac11": "pl.Utf8",
"lat": "pl.Float64",
"long": "pl.Float64",
"imd": "pl.Int64"
}
When I try to use this after reading into python, the values for PolarsDataTypes are still strings and Polars throws an error. I can't have the raw values in JSON as that would throw an error. I have a ton of fields, so I do need to apply the dtypes
parameter.
So my main question is how do I convert the string representation "pl.Int64"
to raw PolarsDataType representation pl.Int64
so I can use it in the read_csv()
dtype
parameter?
答案1
得分: 2
我不知道是否有任何 polars
方法可以实现这一点。我的解决方案利用了 getattr
内置函数来从模块对象中获取属性。
import polars as pl
def convert_string_to_polars_dtype(
mapping: dict[str, str]
) -> dict[str, pl.PolarsDataType]:
return {key: getattr(pl, value.split(".")[1]) for key, value in mapping.items()}
columns_dtypes_polars = {
"pcd": "pl.Utf8",
"streg": "pl.Int64",
"oac11": "pl.Utf8",
"lat": "pl.Float64",
"long": "pl.Float64",
"imd": "pl.Int64",
}
print(convert_string_to_polars_dtype(columns_dtypes_polars))
输出结果应该是:
{'pcd': Utf8, 'streg': Int64, 'oac11': Utf8, 'lat': Float64, 'long': Float64, 'imd': Int64}
英文:
IDK if there is any polars
way to achieve this. My solution make use of getattr
builtin function to fetch the attribute from module object.
>>> import polars as pl
>>>
>>>
>>> def convert_string_to_polars_dtype(
... mapping: dict[str, str]
... ) -> dict[str, pl.PolarsDataType]:
... return {key: getattr(pl, value.split(".")[1]) for key, value in mapping.items()}
...
>>>
>>> columns_dtypes_polars = {
... "pcd": "pl.Utf8",
... "streg": "pl.Int64",
... "oac11": "pl.Utf8",
... "lat": "pl.Float64",
... "long": "pl.Float64",
... "imd": "pl.Int64",
... }
>>>
>>> print(convert_string_to_polars_dtype(columns_dtypes_polars))
{'pcd': Utf8, 'streg': Int64, 'oac11': Utf8, 'lat': Float64, 'long': Float64, 'imd': Int64}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论