英文:
python parse a json with no keys
问题
我有一个非常奇怪的JSON文件,需要解析并插入到数据框中。
这是JSON文件:
{
"data": {
"01mSeHpsjSTHuHSGhpCj": {
"1675348581375": {
"author": "sync",
"new": "11991903358",
"old": null,
"property": "hs_object_id",
"sender": "hs_sync"
},
"1675348610656": {
"author": "sync",
"new": "daily",
"old": "",
"property": "cohort__child_1_",
"sender": "hs_sync"
},
"__collections__": {}
},
"02b85apv47W1PRHFCXDM": {
"1662788673128": {
"author": "sync",
"new": null,
"old": null,
"property": "app_content_category_child_1_",
"sender": "hs_sync"
},
"1662788673129": {
"author": "sync",
"new": null,
"old": null,
"property": "app_content_category_child_2_",
"sender": "hs_sync"
},
"__collections__": {}
}
}
}
数据框应该最终如下所示:
id | time | author | new | old | property | sender |
---|---|---|---|---|---|---|
01mSeHpsjSTHuHSGhpCj | 1675348581375 | sync | 11991903358 | null | hs_object_id | hs_sync |
01mSeHpsjSTHuHSGhpCj | 1675348610656 | sync | daily | cohort__child_1_ | hs_sync |
我尝试使用json_normalize
函数,但没有成功,因为它没有解析JSON。
当尝试将JSON数据放入数据框时,数据(01mSeHpsjSTHuHSGhpCj
和02b85apv47W1PRHFCXDM
)实际上被插入到第一列,但JSON的其余部分被整体插入到下一列。
英文:
I have a very odd json file which I need to parse and inster into a dataframe.
This is the json file
{
"data": {
"01mSeHpsjSTHuHSGhpCj": {
"1675348581375": {
"author": "sync",
"new": "11991903358",
"old": null,
"property": "hs_object_id",
"sender": "hs_sync"
},
"1675348610656": {
"author": "sync",
"new": "daily",
"old": "",
"property": "cohort__child_1_",
"sender": "hs_sync"
},
"__collections__": {}
},
"02b85apv47W1PRHFCXDM": {
"1662788673128": {
"author": "sync",
"new": null,
"old": null,
"property": "app_content_category_child_1_",
"sender": "hs_sync"
},
"1662788673129": {
"author": "sync",
"new": null,
"old": null,
"property": "app_content_category_child_2_",
"sender": "hs_sync"
},
"__collections__": {}
}
}
}
The dataframe should end up looking like this
id | time | author | new | old | property | sender |
---|---|---|---|---|---|---|
01mSeHpsjSTHuHSGhpCj | 1675348581375 | sync | 11991903358 | null | hs_object_id | hs_sync |
01mSeHpsjSTHuHSGhpCj | 1675348610656 | sync | daily | cohort__child_1_ | hs_sync |
I tried using the json_normalize
function with no success as it didn't parse the json.
When trying to drop the json in a dataframe the values from data (01mSeHpsjSTHuHSGhpCj
and 02b85apv47W1PRHFCXDM
) is actually being inserted into the first column but the rest of the json is being inserted as a whole string into the next column.
答案1
得分: -1
这是一个简单的转换。只需枚举键。
import pandas as pd
import json
data = """
{
"data": {
"01mSeHpsjSTHuHSGhpCj": {
"1675348581375": {
"author": "sync",
"new": "11991903358",
"old": null,
"property": "hs_object_id",
"sender": "hs_sync"
},
"1675348610656": {
"author": "sync",
"new": "daily",
"old": "",
"property": "cohort__child_1_",
"sender": "hs_sync"
},
"__collections__": {}
},
"02b85apv47W1PRHFCXDM": {
"1662788673128": {
"author": "sync",
"new": null,
"old": null,
"property": "app_content_category_child_1_",
"sender": "hs_sync"
},
"1662788673129": {
"author": "sync",
"new": null,
"old": null,
"property": "app_content_category_child_2_",
"sender": "hs_sync"
},
"__collections__": {}
}
}
}
"""
data = json.loads(data)
rows = []
for key1, val1 in data['data'].items():
for key2, val2 in val1.items():
if key2[0] == '_':
continue
p = {'id': key1, 'time': key2}
p.update(val2)
rows.append(p)
df = pd.DataFrame(rows)
print(df)
输出:
id time author new old property sender
0 01mSeHpsjSTHuHSGhpCj 1675348581375 sync 11991903358 None hs_object_id hs_sync
1 01mSeHpsjSTHuHSGhpCj 1675348610656 sync daily cohort__child_1_ hs_sync
2 02b85apv47W1PRHFCXDM 1662788673128 sync None None app_content_category_child_1_ hs_sync
3 02b85apv47W1PRHFCXDM 1662788673129 sync None None app_content_category_child_2_ hs_sync
英文:
This is an easy transformation. Just enumerate through the keys.
import pandas as pd
import json
data = """
{
"data": {
"01mSeHpsjSTHuHSGhpCj": {
"1675348581375": {
"author": "sync",
"new": "11991903358",
"old": null,
"property": "hs_object_id",
"sender": "hs_sync"
},
"1675348610656": {
"author": "sync",
"new": "daily",
"old": "",
"property": "cohort__child_1_",
"sender": "hs_sync"
},
"__collections__": {}
},
"02b85apv47W1PRHFCXDM": {
"1662788673128": {
"author": "sync",
"new": null,
"old": null,
"property": "app_content_category_child_1_",
"sender": "hs_sync"
},
"1662788673129": {
"author": "sync",
"new": null,
"old": null,
"property": "app_content_category_child_2_",
"sender": "hs_sync"
},
"__collections__": {}
}
}
}"""
data = json.loads(data)
rows = []
for key1,val1 in data['data'].items():
for key2,val2 in val1.items():
if key2[0] == '_':
continue
p = {'id':key1,'time':key2}
p.update(val2)
rows.append(p)
df = pd.DataFrame( rows )
print(df)
Output:
id time author new old property sender
0 01mSeHpsjSTHuHSGhpCj 1675348581375 sync 11991903358 None hs_object_id hs_sync
1 01mSeHpsjSTHuHSGhpCj 1675348610656 sync daily cohort__child_1_ hs_sync
2 02b85apv47W1PRHFCXDM 1662788673128 sync None None app_content_category_child_1_ hs_sync
3 02b85apv47W1PRHFCXDM 1662788673129 sync None None app_content_category_child_2_ hs_sync
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论