为什么在使用 pandas 规范化 JSON 时,访问嵌套元数据会得到 NaN?

huangapple go评论85阅读模式
英文:

Why reaching nested meta gives NaN when normalizing a json with pandas?

问题

以下是您要翻译的内容:

我的输入是一个Python字典(类似JSON):

d = {
    "type": "type1",
    "details": {
        "name": "foo",
        "date": {
            "timestamp": "01/02/2023 21:42:44",
            "components": {
                "day": 2,
                "month": 1,
                "year": 2023,
                "time": "21:42:44"
            }
        }
    },
    "infos": {
        "records": [
            {
                "field1": "qux",
                "field2": "baz",
            }
        ],
        "class": "P"
    }
}

我使用以下代码:

df = pd.json_normalize(
    d,
    record_path=["infos", "records"],
    meta=[
        "type",
        ["details", "date", "timestamp"],
        ["details", "date", "components", "year"],
        ["infos", "class"]
    ],
    errors="ignore"
)

这给了我以下输出:

field1 field2   type details.date.timestamp details.date.components.year infos.class
0    qux    baz  type1                    NaN                          NaN           P

但我期望得到这个输出:

field1 field2   type details.date.timestamp details.date.components.year infos.class
0    qux    baz  type1    01/02/2023 21:42:44                         2023           P

老实说,我对`meta`参数感到非常困惑!我不知道我做错了什么...

您能解释一下它的逻辑吗?
英文:

My input is a Python dictionnary (json-like) :

d = {
    "type": "type1",
    "details": {
        "name": "foo",
        "date": {
            "timestamp": "01/02/2023 21:42:44",
            "components": {
                "day": 2,
                "month": 1,
                "year": 2023,
                "time": "21:42:44"
            }
        }
    },
    "infos": {
        "records": [
            {
                "field1": "qux",
                "field2": "baz",
            }
        ],
        "class": "P"
    }
}

I'm using the code below :

df = pd.json_normalize(
    d,
    record_path=["infos", "records"],
    meta=[
        "type",
        ["details", "date", "timestamp"],
        ["details", "date", "components", "year"],
        ["infos", "class"]
    ],
    errors="ignore"
)

Which gives me this output :

  field1 field2   type details.date.timestamp details.date.components.year infos.class
0    qux    baz  type1                    NaN                          NaN           P

But I'm expecting this one :

  field1 field2   type details.date.timestamp details.date.components.year infos.class
0    qux    baz  type1    01/02/2023 21:42:44                         2023           P

To be honest, I'm going crazy with the meta parameter! I ignore what I'm doing wrong..

Can you explain its logic, please ?

答案1

得分: 2

我认为你应该在`record_path=`中额外添加`[]`:

```py
df = pd.json_normalize(
    d,
    record_path=[["infos", "records"]],  # <-- 在这里加上 []
    meta=[
        "type",
        ["details", "date", "timestamp"],
        ["details", "date", "components", "year"],
        ["infos", "class"],
    ],
    errors="ignore",
)

print(df)

打印:

  field1 field2   type details.date.timestamp details.date.components.year infos.class
0    qux    baz  type1    01/02/2023 21:42:44                         2023           P

<details>
<summary>英文:</summary>

I think you should put extra `[]` in `record_path=`:

```py
df = pd.json_normalize(
    d,
    record_path=[[&quot;infos&quot;, &quot;records&quot;]],  # &lt;-- put [] here
    meta=[
        &quot;type&quot;,
        [&quot;details&quot;, &quot;date&quot;, &quot;timestamp&quot;],
        [&quot;details&quot;, &quot;date&quot;, &quot;components&quot;, &quot;year&quot;],
        [&quot;infos&quot;, &quot;class&quot;],
    ],
    errors=&quot;ignore&quot;,
)

print(df)

Prints:

  field1 field2   type details.date.timestamp details.date.components.year infos.class
0    qux    baz  type1    01/02/2023 21:42:44                         2023           P

huangapple
  • 本文由 发表于 2023年7月24日 15:05:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76752107.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定