问题

我有一个非常奇怪的JSON文件，需要解析并插入到数据框中。
这是JSON文件：

{
    "data": {
        "01mSeHpsjSTHuHSGhpCj": {
            "1675348581375": {
                "author": "sync",
                "new": "11991903358",
                "old": null,
                "property": "hs_object_id",
                "sender": "hs_sync"
            },
            "1675348610656": {
                "author": "sync",
                "new": "daily",
                "old": "",
                "property": "cohort__child_1_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        },
        "02b85apv47W1PRHFCXDM": {
            "1662788673128": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_1_",
                "sender": "hs_sync"
            },
            "1662788673129": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_2_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        }
    }
}

数据框应该最终如下所示：

id	time	author	new	old	property	sender
01mSeHpsjSTHuHSGhpCj	1675348581375	sync	11991903358	null	hs_object_id	hs_sync
01mSeHpsjSTHuHSGhpCj	1675348610656	sync	daily		cohort__child_1_	hs_sync

我尝试使用json_normalize函数，但没有成功，因为它没有解析JSON。
当尝试将JSON数据放入数据框时，数据（01mSeHpsjSTHuHSGhpCj和02b85apv47W1PRHFCXDM）实际上被插入到第一列，但JSON的其余部分被整体插入到下一列。

英文:

I have a very odd json file which I need to parse and inster into a dataframe.
This is the json file

{
    &quot;data&quot;: {
        &quot;01mSeHpsjSTHuHSGhpCj&quot;: {
            &quot;1675348581375&quot;: {
                &quot;author&quot;: &quot;sync&quot;,
                &quot;new&quot;: &quot;11991903358&quot;,
                &quot;old&quot;: null,
                &quot;property&quot;: &quot;hs_object_id&quot;,
                &quot;sender&quot;: &quot;hs_sync&quot;
            },
            &quot;1675348610656&quot;: {
                &quot;author&quot;: &quot;sync&quot;,
                &quot;new&quot;: &quot;daily&quot;,
                &quot;old&quot;: &quot;&quot;,
                &quot;property&quot;: &quot;cohort__child_1_&quot;,
                &quot;sender&quot;: &quot;hs_sync&quot;
            },
            &quot;__collections__&quot;: {}
        },
        &quot;02b85apv47W1PRHFCXDM&quot;: {
            &quot;1662788673128&quot;: {
                &quot;author&quot;: &quot;sync&quot;,
                &quot;new&quot;: null,
                &quot;old&quot;: null,
                &quot;property&quot;: &quot;app_content_category_child_1_&quot;,
                &quot;sender&quot;: &quot;hs_sync&quot;
            },
            &quot;1662788673129&quot;: {
                &quot;author&quot;: &quot;sync&quot;,
                &quot;new&quot;: null,
                &quot;old&quot;: null,
                &quot;property&quot;: &quot;app_content_category_child_2_&quot;,
                &quot;sender&quot;: &quot;hs_sync&quot;
            },
            &quot;__collections__&quot;: {}
        }
    }
}

The dataframe should end up looking like this

id	time	author	new	old	property	sender
01mSeHpsjSTHuHSGhpCj	1675348581375	sync	11991903358	null	hs_object_id	hs_sync
01mSeHpsjSTHuHSGhpCj	1675348610656	sync	daily		cohort__child_1_	hs_sync

I tried using the json_normalize function with no success as it didn't parse the json.
When trying to drop the json in a dataframe the values from data (01mSeHpsjSTHuHSGhpCj and 02b85apv47W1PRHFCXDM) is actually being inserted into the first column but the rest of the json is being inserted as a whole string into the next column.

答案1

得分: -1

这是一个简单的转换。只需枚举键。

import pandas as pd
import json 

data = """
{
    "data": {
        "01mSeHpsjSTHuHSGhpCj": {
            "1675348581375": {
                "author": "sync",
                "new": "11991903358",
                "old": null,
                "property": "hs_object_id",
                "sender": "hs_sync"
            },
            "1675348610656": {
                "author": "sync",
                "new": "daily",
                "old": "",
                "property": "cohort__child_1_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        },
        "02b85apv47W1PRHFCXDM": {
            "1662788673128": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_1_",
                "sender": "hs_sync"
            },
            "1662788673129": {
                "author": "sync",
                "new": null,
                "old": null,
                "property": "app_content_category_child_2_",
                "sender": "hs_sync"
            },
            "__collections__": {}
        }
    }
}
"""

data = json.loads(data)
rows = []
for key1, val1 in data['data'].items():
    for key2, val2 in val1.items():
        if key2[0] == '_':
            continue
        p = {'id': key1, 'time': key2}
        p.update(val2)
        rows.append(p)

df = pd.DataFrame(rows)
print(df)

输出：

                     id           time author          new   old                       property   sender
0  01mSeHpsjSTHuHSGhpCj  1675348581375   sync  11991903358  None                   hs_object_id  hs_sync
1  01mSeHpsjSTHuHSGhpCj  1675348610656   sync        daily                     cohort__child_1_  hs_sync
2  02b85apv47W1PRHFCXDM  1662788673128   sync         None  None  app_content_category_child_1_  hs_sync
3  02b85apv47W1PRHFCXDM  1662788673129   sync         None  None  app_content_category_child_2_  hs_sync

英文:

This is an easy transformation. Just enumerate through the keys.

import pandas as pd
import json 

data = &quot;&quot;&quot;
{
    &quot;data&quot;: {
        &quot;01mSeHpsjSTHuHSGhpCj&quot;: {
            &quot;1675348581375&quot;: {
                &quot;author&quot;: &quot;sync&quot;,
                &quot;new&quot;: &quot;11991903358&quot;,
                &quot;old&quot;: null,
                &quot;property&quot;: &quot;hs_object_id&quot;,
                &quot;sender&quot;: &quot;hs_sync&quot;
            },
            &quot;1675348610656&quot;: {
                &quot;author&quot;: &quot;sync&quot;,
                &quot;new&quot;: &quot;daily&quot;,
                &quot;old&quot;: &quot;&quot;,
                &quot;property&quot;: &quot;cohort__child_1_&quot;,
                &quot;sender&quot;: &quot;hs_sync&quot;
            },
            &quot;__collections__&quot;: {}
        },
        &quot;02b85apv47W1PRHFCXDM&quot;: {
            &quot;1662788673128&quot;: {
                &quot;author&quot;: &quot;sync&quot;,
                &quot;new&quot;: null,
                &quot;old&quot;: null,
                &quot;property&quot;: &quot;app_content_category_child_1_&quot;,
                &quot;sender&quot;: &quot;hs_sync&quot;
            },
            &quot;1662788673129&quot;: {
                &quot;author&quot;: &quot;sync&quot;,
                &quot;new&quot;: null,
                &quot;old&quot;: null,
                &quot;property&quot;: &quot;app_content_category_child_2_&quot;,
                &quot;sender&quot;: &quot;hs_sync&quot;
            },
            &quot;__collections__&quot;: {}
        }
    }
}&quot;&quot;&quot;

data = json.loads(data)
rows = []
for key1,val1 in data[&#39;data&#39;].items():
    for key2,val2 in val1.items():
        if key2[0] == &#39;_&#39;:
            continue
        p = {&#39;id&#39;:key1,&#39;time&#39;:key2}
        p.update(val2)
        rows.append(p)

df = pd.DataFrame( rows )
print(df)

Output:

                     id           time author          new   old                       property   sender
0  01mSeHpsjSTHuHSGhpCj  1675348581375   sync  11991903358  None                   hs_object_id  hs_sync
1  01mSeHpsjSTHuHSGhpCj  1675348610656   sync        daily                     cohort__child_1_  hs_sync
2  02b85apv47W1PRHFCXDM  1662788673128   sync         None  None  app_content_category_child_1_  hs_sync
3  02b85apv47W1PRHFCXDM  1662788673129   sync         None  None  app_content_category_child_2_  hs_sync

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python解析没有键的JSON。

问题

答案1

使用管道通过 sample_weight 参数与 XGBoost 配合使用

从Java和Python中返回参数

PHP切换情况更新数组不影响数组

如何在Pandas apply()函数中应用异步调用到API

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论