英文:
Polars: Parsing nested fields when reading a csv file
问题
以下是翻译好的部分:
我有以下的CSV结构:
{
"column_1": "Some",
"column_2": "Csv",
"column_3": "Data",
"column_4": [
"Nested",
"Field",
"Containing",
"An",
"Array"
],
"column_5": "And",
"column_6": "Some",
"column_7": "More",
"column_8": "Fields",
"column_9": "Here"
}
我正在尝试使用以下方式解析CSV文件:
pl.read_csv(
source="somefile.csv",
has_header=False,
separator=";"
)
在文档中提到了List
类型,但我不太明白如何在dtypes
参数中使用它。
提前感谢您!
英文:
I have the following csv structure:
Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here
Is there any away to parse the nested list using polars ?
What is the expected output ?
{
"column_1": "Some",
"column_2": "Csv",
"column_3": "Data",
"column_4": [
"Nested",
"Field",
"Containing",
"An",
"Array"
],
"column_5": "And",
"column_6": "Some",
"column_7": "More",
"column_8": "Fields",
"column_9": "Here"
}
I'm trying to parse the csv file using something like:
pl.read_csv(
source="somefile.csv",
has_header=False,
separator=";"
)
In the documentation they mention the List
type, but I don't understood how to use it in the dtypes parameter.
Thanks in advance!
答案1
得分: 1
You could .str.split
读取后拆分列:
csv = b"""
Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here
""".strip()
(pl.read_csv(csv, separator=";", has_header=False)
.with_columns(pl.col("column_4").str.split("|")))
形状: (1, 9)
┌──────────┬──────────┬──────────┬────────────────────────────────┬───┬──────────┬──────────┬──────────┬──────────┐
│ column_1 ┆ column_2 ┆ column_3 ┆ column_4 ┆ … ┆ column_6 ┆ column_7 ┆ column_8 ┆ column_9 │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ list[str] ┆ ┆ str ┆ str ┆ str ┆ str │
╞══════════╪══════════╪══════════╪════════════════════════════════╪═══╪══════════╪══════════╪══════════╪══════════╡
│ Some ┆ Csv ┆ Data ┆ ["Nested", "Field", … "Array"] ┆ … ┆ Some ┆ More ┆ Fields ┆ Here │
└──────────┴──────────┴──────────┴────────────────────────────────┴───┴──────────┴──────────┴──────────┴──────────┘
英文:
You could .str.split
the column after reading:
csv = b"""
Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here
""".strip()
(pl.read_csv(csv, separator=";", has_header=False)
.with_columns(pl.col("column_4").str.split("|")))
shape: (1, 9)
┌──────────┬──────────┬──────────┬────────────────────────────────┬───┬──────────┬──────────┬──────────┬──────────┐
│ column_1 ┆ column_2 ┆ column_3 ┆ column_4 ┆ … ┆ column_6 ┆ column_7 ┆ column_8 ┆ column_9 │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ list[str] ┆ ┆ str ┆ str ┆ str ┆ str │
╞══════════╪══════════╪══════════╪════════════════════════════════╪═══╪══════════╪══════════╪══════════╪══════════╡
│ Some ┆ Csv ┆ Data ┆ ["Nested", "Field", … "Array"] ┆ … ┆ Some ┆ More ┆ Fields ┆ Here │
└──────────┴──────────┴──────────┴────────────────────────────────┴───┴──────────┴──────────┴──────────┴──────────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论