Python解析没有键的JSON。

huangapple go评论104阅读模式
英文:

python parse a json with no keys

问题

我有一个非常奇怪的JSON文件,需要解析并插入到数据框中。
这是JSON文件:

{
    "data": {
        "01mSeHpsjSTHuHSGhpCj": {
            "1675348581375": {
                "author": "sync",
                "new": "11991903358",
                "old": null,
                "property": "hs_object_id",
                "sender": "hs_sync"
            },
            "1675348610656": {
                "author": "sync",
                "new": "daily",
                "old": "",
                "property": "cohort__child_1_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        },
        "02b85apv47W1PRHFCXDM": {
            "1662788673128": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_1_",
                "sender": "hs_sync"
            },
            "1662788673129": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_2_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        }
    }
}

数据框应该最终如下所示:

id time author new old property sender
01mSeHpsjSTHuHSGhpCj 1675348581375 sync 11991903358 null hs_object_id hs_sync
01mSeHpsjSTHuHSGhpCj 1675348610656 sync daily cohort__child_1_ hs_sync

我尝试使用json_normalize函数,但没有成功,因为它没有解析JSON。
当尝试将JSON数据放入数据框时,数据(01mSeHpsjSTHuHSGhpCj02b85apv47W1PRHFCXDM)实际上被插入到第一列,但JSON的其余部分被整体插入到下一列。

英文:

I have a very odd json file which I need to parse and inster into a dataframe.
This is the json file

{
    "data": {
        "01mSeHpsjSTHuHSGhpCj": {
            "1675348581375": {
                "author": "sync",
                "new": "11991903358",
                "old": null,
                "property": "hs_object_id",
                "sender": "hs_sync"
            },
            "1675348610656": {
                "author": "sync",
                "new": "daily",
                "old": "",
                "property": "cohort__child_1_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        },
        "02b85apv47W1PRHFCXDM": {
            "1662788673128": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_1_",
                "sender": "hs_sync"
            },
            "1662788673129": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_2_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        }
    }
}

The dataframe should end up looking like this

id time author new old property sender
01mSeHpsjSTHuHSGhpCj 1675348581375 sync 11991903358 null hs_object_id hs_sync
01mSeHpsjSTHuHSGhpCj 1675348610656 sync daily cohort__child_1_ hs_sync

I tried using the json_normalize function with no success as it didn't parse the json.
When trying to drop the json in a dataframe the values from data (01mSeHpsjSTHuHSGhpCj and 02b85apv47W1PRHFCXDM) is actually being inserted into the first column but the rest of the json is being inserted as a whole string into the next column.

答案1

得分: -1

这是一个简单的转换。只需枚举键。

import pandas as pd
import json 

data = """
{
    "data": {
        "01mSeHpsjSTHuHSGhpCj": {
            "1675348581375": {
                "author": "sync",
                "new": "11991903358",
                "old": null,
                "property": "hs_object_id",
                "sender": "hs_sync"
            },
            "1675348610656": {
                "author": "sync",
                "new": "daily",
                "old": "",
                "property": "cohort__child_1_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        },
        "02b85apv47W1PRHFCXDM": {
            "1662788673128": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_1_",
                "sender": "hs_sync"
            },
            "1662788673129": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_2_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        }
    }
}
"""

data = json.loads(data)
rows = []
for key1, val1 in data['data'].items():
    for key2, val2 in val1.items():
        if key2[0] == '_':
            continue
        p = {'id': key1, 'time': key2}
        p.update(val2)
        rows.append(p)

df = pd.DataFrame(rows)
print(df)

输出:

                     id           time author          new   old                       property   sender
0  01mSeHpsjSTHuHSGhpCj  1675348581375   sync  11991903358  None                   hs_object_id  hs_sync
1  01mSeHpsjSTHuHSGhpCj  1675348610656   sync        daily                     cohort__child_1_  hs_sync
2  02b85apv47W1PRHFCXDM  1662788673128   sync         None  None  app_content_category_child_1_  hs_sync
3  02b85apv47W1PRHFCXDM  1662788673129   sync         None  None  app_content_category_child_2_  hs_sync
英文:

This is an easy transformation. Just enumerate through the keys.

import pandas as pd
import json 

data = """
{
    "data": {
        "01mSeHpsjSTHuHSGhpCj": {
            "1675348581375": {
                "author": "sync",
                "new": "11991903358",
                "old": null,
                "property": "hs_object_id",
                "sender": "hs_sync"
            },
            "1675348610656": {
                "author": "sync",
                "new": "daily",
                "old": "",
                "property": "cohort__child_1_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        },
        "02b85apv47W1PRHFCXDM": {
            "1662788673128": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_1_",
                "sender": "hs_sync"
            },
            "1662788673129": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_2_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        }
    }
}"""

data = json.loads(data)
rows = []
for key1,val1 in data['data'].items():
    for key2,val2 in val1.items():
        if key2[0] == '_':
            continue
        p = {'id':key1,'time':key2}
        p.update(val2)
        rows.append(p)

df = pd.DataFrame( rows )
print(df)

Output:

                     id           time author          new   old                       property   sender
0  01mSeHpsjSTHuHSGhpCj  1675348581375   sync  11991903358  None                   hs_object_id  hs_sync
1  01mSeHpsjSTHuHSGhpCj  1675348610656   sync        daily                     cohort__child_1_  hs_sync
2  02b85apv47W1PRHFCXDM  1662788673128   sync         None  None  app_content_category_child_1_  hs_sync
3  02b85apv47W1PRHFCXDM  1662788673129   sync         None  None  app_content_category_child_2_  hs_sync

huangapple
  • 本文由 发表于 2023年2月14日 09:21:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75442653.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定