英文:
Polars: Parsing nested fields when reading a csv file
问题
以下是翻译好的部分:
我有以下的CSV结构:
{
   "column_1": "Some",
   "column_2": "Csv",
   "column_3": "Data",
   "column_4": [
       "Nested",
       "Field",
       "Containing",
       "An",
       "Array"
   ],
   "column_5": "And",
   "column_6": "Some",
   "column_7": "More",
   "column_8": "Fields",
   "column_9": "Here"
}
我正在尝试使用以下方式解析CSV文件:
pl.read_csv(
 source="somefile.csv",
 has_header=False,
 separator=";"
)
在文档中提到了List类型,但我不太明白如何在dtypes参数中使用它。
提前感谢您!
英文:
I have the following csv structure:
Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here
Is there any away to parse the nested list using polars ?
What is the expected output ?
{
   "column_1": "Some",
   "column_2": "Csv",
   "column_3": "Data",
   "column_4": [
       "Nested",
       "Field",
       "Containing",
       "An",
       "Array"
   ],
   "column_5": "And",
   "column_6": "Some",
   "column_7": "More",
   "column_8": "Fields",
   "column_9": "Here"
}
I'm trying to parse the csv file using something like:
pl.read_csv(
 source="somefile.csv",
 has_header=False,
 separator=";"
)
In the documentation they mention the List type, but I don't understood how to use it in the dtypes parameter.
Thanks in advance!
答案1
得分: 1
You could .str.split 读取后拆分列:
csv = b"""
Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here
""".strip()
(pl.read_csv(csv, separator=";", has_header=False)
   .with_columns(pl.col("column_4").str.split("|")))
形状: (1, 9)
┌──────────┬──────────┬──────────┬────────────────────────────────┬───┬──────────┬──────────┬──────────┬──────────┐
│ column_1 ┆ column_2 ┆ column_3 ┆ column_4                       ┆ … ┆ column_6 ┆ column_7 ┆ column_8 ┆ column_9 │
│ ---      ┆ ---      ┆ ---      ┆ ---                            ┆   ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str      ┆ str      ┆ str      ┆ list[str]                      ┆   ┆ str      ┆ str      ┆ str      ┆ str      │
╞══════════╪══════════╪══════════╪════════════════════════════════╪═══╪══════════╪══════════╪══════════╪══════════╡
│ Some     ┆ Csv      ┆ Data     ┆ ["Nested", "Field", … "Array"] ┆ … ┆ Some     ┆ More     ┆ Fields   ┆ Here     │
└──────────┴──────────┴──────────┴────────────────────────────────┴───┴──────────┴──────────┴──────────┴──────────┘
英文:
You could .str.split the column after reading:
csv = b"""
Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here
""".strip()
(pl.read_csv(csv, separator=";", has_header=False)
   .with_columns(pl.col("column_4").str.split("|")))
shape: (1, 9)
┌──────────┬──────────┬──────────┬────────────────────────────────┬───┬──────────┬──────────┬──────────┬──────────┐
│ column_1 ┆ column_2 ┆ column_3 ┆ column_4                       ┆ … ┆ column_6 ┆ column_7 ┆ column_8 ┆ column_9 │
│ ---      ┆ ---      ┆ ---      ┆ ---                            ┆   ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str      ┆ str      ┆ str      ┆ list[str]                      ┆   ┆ str      ┆ str      ┆ str      ┆ str      │
╞══════════╪══════════╪══════════╪════════════════════════════════╪═══╪══════════╪══════════╪══════════╪══════════╡
│ Some     ┆ Csv      ┆ Data     ┆ ["Nested", "Field", … "Array"] ┆ … ┆ Some     ┆ More     ┆ Fields   ┆ Here     │
└──────────┴──────────┴──────────┴────────────────────────────────┴───┴──────────┴──────────┴──────────┴──────────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论