英文:
Create normalized dataframe from a nested json
问题
Sure, here's the translated code:
with open("employee.json") as file:
data = json.load(file)
data_df = pd.json_normalize(data, 'addresses', ['rec_id', 'timestamp', 'edited_timestamp', 'user.id', 'user.name'])
data_df.columns = ['rec_id', 'addresses.address_type', 'addresses.street1', 'addresses.street2', 'addresses.city', 'timestamp', 'edited_timestamp', 'user.id', 'user.name']
display(data_df)
英文:
I am trying to create a dataframe from nested json file but running into trouble.
[
{
"rec_id": "1",
"user": {
"id": "12414",
"name": "Steve"
},
"addresses": [
{
"address_type": "Home",
"street1": "100 Main St",
"street2": null,
"city": "Chicago"
},
{
"address_type": "Work",
"street1": "100 Main St",
"street2": null,
"city": "Chicago"
}
],
"timestamp": "2023-07-28T20:05:14.859000+00:00",
"edited_timestamp": null
},
{
"rec_id": "2",
"user": {
"id": "214521",
"name": "Tim"
},
"addresses": [
{
"address_type": "Home",
"street1": "100 Main St",
"street2": null,
"city": "Boston"
},
{
"address_type": "Work",
"street1": "100 Main St",
"street2": null,
"city": "Boston"
}
],
"timestamp": "2023-07-28T20:05:14.859000+00:00",
"edited_timestamp": null
},
{
"rec_id": "3",
"user": {
"id": "12121",
"name": "Jack"
},
"addresses": [
{
"address_type": "Home",
"street1": "100 Main St",
"street2": null,
"city": "Las Vegas"
} ]
"timestamp": "2023-07-28T20:05:14.859000+00:00",
"edited_timestamp": null
}
]
I tried below:
with open("employee.json") as file:
data = json.load(file)
data_df = pd.json_normalize(data)
data_df.columns.values.tolist()
['rec_id',
'addresses',
'timestamp',
'edited_timestamp',
'user.id',
'user.name']
display(data_df)
rec_id addresses timestamp edited_timestamp user.id user.name
0 1 [{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Chicago'}, {'address_type': 'Work', 'street1': '100 Main St', 'street2': None, 'city': 'Chicago'}] 2023-07-28T20:05:14.859000+00:00 None 12414 Steve
1 2 [{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Boston'}, {'address_type': 'Work', 'street1': '100 Main St', 'street2': None, 'city': 'Boston'}] 2023-07-28T20:05:14.859000+00:00 None 214521 Tim
2 3 [{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Las Vegas'}] 2023-07-28T20:05:14.859000+00:00 None 12121 Jack
How do I get the output as below -
rec_id addresses.address_type addresses.street1 addresses.street2 addresses.city timestamp edited_timestamp user.id user.name
0 1 Home 100 Main St None Chicago 2023-07-28T20:05:14.859000+00:00 None 12414 Steve
1 1 Work 100 Main St None Chicago 2023-07-28T20:05:14.859000+00:00 None 12414 Steve
2 2 Home 100 Main St None Boston 2023-07-28T20:05:14.859000+00:00 None 214521 Tim
3 2 Work 100 Main St None Boston 2023-07-28T20:05:14.859000+00:00 None 214521 Tim
4 3 Home 100 Main St None Las Vegas 2023-07-28T20:05:14.859000+00:00 None 12121 Jack
答案1
得分: 1
这是您要翻译的内容:
尝试这个,更多关于 .json_normalize
df = pd.json_normalize(data,
record_path='addresses',
meta=['rec_id', ["user", "id"], ["user", "name"], 'timestamp', 'edited_timestamp'])
输出:
address_type | street1 | street2 | city | rec_id | user.id | user.name | timestamp | edited_timestamp | |
---|---|---|---|---|---|---|---|---|---|
0 | Home | 100 Main St | Chicago | 1 | 12414 | Steve | 2023-07-28T20:05:14.859000+00:00 | ||
1 | Work | 100 Main St | Chicago | 1 | 12414 | Steve | 2023-07-28T20:05:14.859000+00:00 | ||
2 | Home | 100 Main St | Boston | 2 | 214521 | Tim | 2023-07-28T20:05:14.859000+00:00 | ||
3 | Work | 100 Main St | Boston | 2 | 214521 | Tim | 2023-07-28T20:05:14.859000+00:00 | ||
4 | Home | 100 Main St | Las Vegas | 3 | 12121 | Jack | 2023-07-28T20:05:14.859000+00:00 |
英文:
Try this, more about .json_normalize
df = pd.json_normalize(data,
record_path='addresses',
meta=['rec_id', ["user", "id"], ["user", "name"], 'timestamp', 'edited_timestamp'])
Output:
address_type | street1 | street2 | city | rec_id | user.id | user.name | timestamp | edited_timestamp | |
---|---|---|---|---|---|---|---|---|---|
0 | Home | 100 Main St | Chicago | 1 | 12414 | Steve | 2023-07-28T20:05:14.859000+00:00 | ||
1 | Work | 100 Main St | Chicago | 1 | 12414 | Steve | 2023-07-28T20:05:14.859000+00:00 | ||
2 | Home | 100 Main St | Boston | 2 | 214521 | Tim | 2023-07-28T20:05:14.859000+00:00 | ||
3 | Work | 100 Main St | Boston | 2 | 214521 | Tim | 2023-07-28T20:05:14.859000+00:00 | ||
4 | Home | 100 Main St | Las Vegas | 3 | 12121 | Jack | 2023-07-28T20:05:14.859000+00:00 |
答案2
得分: 0
尝试:
data_df = pd.json_normalize(
data,
meta=["rec_id", "timestamp", "edited_timestamp", ["user", "id"], ["user", "name"]],
record_path=["addresses"],
record_prefix="addresses."
)
print(data_df)
打印:
addresses.address_type addresses.street1 addresses.street2 addresses.city rec_id timestamp edited_timestamp user.id user.name
0 Home 100 Main St None Chicago 1 2023-07-28T20:05:14.859000+00:00 None 12414 Steve
1 Work 100 Main St None Chicago 1 2023-07-28T20:05:14.859000+00:00 None 12414 Steve
2 Home 100 Main St None Boston 2 2023-07-28T20:05:14.859000+00:00 None 214521 Tim
3 Work 100 Main St None Boston 2 2023-07-28T20:05:14.859000+00:00 None 214521 Tim
4 Home 100 Main St None Las Vegas 3 2023-07-28T20:05:14.859000+00:00 None 12121 Jack
英文:
Try:
data_df = pd.json_normalize(
data,
meta=["rec_id", "timestamp", "edited_timestamp", ["user", "id"], ["user", "name"]],
record_path=["addresses"],
record_prefix="addresses.",
)
print(data_df)
Prints:
addresses.address_type addresses.street1 addresses.street2 addresses.city rec_id timestamp edited_timestamp user.id user.name
0 Home 100 Main St None Chicago 1 2023-07-28T20:05:14.859000+00:00 None 12414 Steve
1 Work 100 Main St None Chicago 1 2023-07-28T20:05:14.859000+00:00 None 12414 Steve
2 Home 100 Main St None Boston 2 2023-07-28T20:05:14.859000+00:00 None 214521 Tim
3 Work 100 Main St None Boston 2 2023-07-28T20:05:14.859000+00:00 None 214521 Tim
4 Home 100 Main St None Las Vegas 3 2023-07-28T20:05:14.859000+00:00 None 12121 Jack
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论