英文:
Python Create dataframe from nested dict with lists
问题
App | id | stages | requestCpu | requestMemory |
---|---|---|---|---|
appName | 123 | dev | 1000 | 1024 |
appName | 123 | staging | 3200 | 1024 |
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2"...}
以前我使用了类似这样的方法:
df = pd.DataFrame.from_dict(test_data, orient='index')
df = pd.concat([df.drop(['stages'], axis=1), (df['stages'].apply(pd.Series))], axis=1)
df.index.name = "App"
然而,这无法拆分列表部分,而且现在各个阶段都在列中,不是我想要的样子。
英文:
I am trying to create a dataframe / csv that looks like this
App | id | stages | requestCpu | requestMemory |
---|---|---|---|---|
appName | 123 | dev | 1000 | 1024 |
appName | 123 | staging | 3200 | 1024 |
The dict data looks like this and includes quite a lot of apps, however all the data inside the apps looks the same with the dict layout:
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2"...}
I used something like this before:
df = pd.DataFrame.from_dict(test_data, orient='index')
df = pd.concat([df.drop(['stages'], axis=1), (df['stages'].apply(pd.Series))], axis=1)
df.index.name = "App"
However this wasn't able to split up the list part and also the stages were now in columns so not how i wanted it to look..
Any help much appreciated, thanks
答案1
得分: 0
以下是翻译好的部分:
Easiest solution would be to iterate the rows prior to loading it with pandas:
import pandas as pd
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}, "appName2": {"id": "456", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}
rows = []
for app, app_data in test_data.items():
for stage, stage_data in app_data["stages"].items():
row = {
"App": app,
"id": app_data["id"],
"stages": stage
}
for metric in stage_data:
metric_name, metric_value = list(metric.items())[0]
row[metric_name] = metric_value
rows.append(row)
df = pd.json_normalize(rows)
# Reorder columns
df = df[["App", "id", "stages", "request.cpu", "request.memory"]]
Output:
App | id | stages | request.cpu | request.memory | |
---|---|---|---|---|---|
0 | appName | 123 | dev | 1000 | 1024 |
1 | appName | 123 | staging | 3200 | 1024 |
2 | appName2 | 456 | dev | 1000 | 1024 |
3 | appName2 | 456 | staging | 3200 | 1024 |
<details>
<summary>英文:</summary>
Easiest solution would be to iterate the rows prior to loading it with pandas:
import pandas as pd
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2": {"id": "456", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}}
rows = []
for app, app_data in test_data.items():
for stage, stage_data in app_data["stages"].items():
row = {
"App": app,
"id": app_data["id"],
"stages": stage
}
for metric in stage_data:
metric_name, metric_value = list(metric.items())[0]
row[metric_name] = metric_value
rows.append(row)
df = pd.json_normalize(rows)
# Reorder columns
df = df[["App", "id", "stages", "request.cpu", "request.memory"]]
Output:
| | App | id | stages | request.cpu | request.memory |
|---:|:---------|-----:|:---------|--------------:|-----------------:|
| 0 | appName | 123 | dev | 1000 | 1024 |
| 1 | appName | 123 | staging | 3200 | 1024 |
| 2 | appName2 | 456 | dev | 1000 | 1024 |
| 3 | appName2 | 456 | staging | 3200 | 1024 |
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论