Polars:在读取CSV文件时解析嵌套字段

huangapple go评论102阅读模式
英文:

Polars: Parsing nested fields when reading a csv file

问题

以下是翻译好的部分:

我有以下的CSV结构:

{
   "column_1": "Some",
   "column_2": "Csv",
   "column_3": "Data",
   "column_4": [
       "Nested",
       "Field",
       "Containing",
       "An",
       "Array"
   ],
   "column_5": "And",
   "column_6": "Some",
   "column_7": "More",
   "column_8": "Fields",
   "column_9": "Here"
}

我正在尝试使用以下方式解析CSV文件:

pl.read_csv(
 source="somefile.csv",
 has_header=False,
 separator=";"
)

在文档中提到了List类型,但我不太明白如何在dtypes参数中使用它。

提前感谢您!

英文:

I have the following csv structure:

Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here

Is there any away to parse the nested list using polars ?

What is the expected output ?

{
   "column_1": "Some",
   "column_2": "Csv",
   "column_3": "Data",
   "column_4": [
       "Nested",
       "Field",
       "Containing",
       "An",
       "Array"
   ],
   "column_5": "And",
   "column_6": "Some",
   "column_7": "More",
   "column_8": "Fields",
   "column_9": "Here"
}

I'm trying to parse the csv file using something like:


pl.read_csv(
 source="somefile.csv",
 has_header=False,
 separator=";"
)

In the documentation they mention the List type, but I don't understood how to use it in the dtypes parameter.

Thanks in advance!

答案1

得分: 1

You could .str.split 读取后拆分列:

csv = b"""
Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here
""".strip()

(pl.read_csv(csv, separator=";", has_header=False)
   .with_columns(pl.col("column_4").str.split("|")))
形状: (1, 9)
┌──────────┬──────────┬──────────┬────────────────────────────────┬───┬──────────┬──────────┬──────────┬──────────┐
│ column_1 ┆ column_2 ┆ column_3 ┆ column_4                       ┆ … ┆ column_6 ┆ column_7 ┆ column_8 ┆ column_9 │
│ ---      ┆ ---      ┆ ---      ┆ ---                            ┆   ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str      ┆ str      ┆ str      ┆ list[str]                      ┆   ┆ str      ┆ str      ┆ str      ┆ str      │
╞══════════╪══════════╪══════════╪════════════════════════════════╪═══╪══════════╪══════════╪══════════╪══════════╡
│ Some     ┆ Csv      ┆ Data     ┆ ["Nested", "Field", … "Array"] ┆ … ┆ Some     ┆ More     ┆ Fields   ┆ Here     │
└──────────┴──────────┴──────────┴────────────────────────────────┴───┴──────────┴──────────┴──────────┴──────────┘
英文:

You could .str.split the column after reading:

csv = b"""
Some;Csv;Data;Nested|Field|Containing|An|Array;And;Some;More;Fields;Here
""".strip()

(pl.read_csv(csv, separator=";", has_header=False)
   .with_columns(pl.col("column_4").str.split("|")))
shape: (1, 9)
┌──────────┬──────────┬──────────┬────────────────────────────────┬───┬──────────┬──────────┬──────────┬──────────┐
│ column_1 ┆ column_2 ┆ column_3 ┆ column_4                       ┆ … ┆ column_6 ┆ column_7 ┆ column_8 ┆ column_9 │
│ ---      ┆ ---      ┆ ---      ┆ ---                            ┆   ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str      ┆ str      ┆ str      ┆ list[str]                      ┆   ┆ str      ┆ str      ┆ str      ┆ str      │
╞══════════╪══════════╪══════════╪════════════════════════════════╪═══╪══════════╪══════════╪══════════╪══════════╡
│ Some     ┆ Csv      ┆ Data     ┆ ["Nested", "Field", … "Array"] ┆ … ┆ Some     ┆ More     ┆ Fields   ┆ Here     │
└──────────┴──────────┴──────────┴────────────────────────────────┴───┴──────────┴──────────┴──────────┴──────────┘

huangapple
  • 本文由 发表于 2023年5月17日 19:53:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76271807.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定