2023年2月18日 20:28:46go评论103阅读模式

英文:

Python Create dataframe from nested dict with lists

问题

App	id	stages	requestCpu	requestMemory
appName	123	dev	1000	1024
appName	123	staging	3200	1024

test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2"...}

以前我使用了类似这样的方法：

df = pd.DataFrame.from_dict(test_data, orient='index')
df = pd.concat([df.drop(['stages'], axis=1), (df['stages'].apply(pd.Series))], axis=1)
df.index.name = "App"

然而，这无法拆分列表部分，而且现在各个阶段都在列中，不是我想要的样子。

英文:

I am trying to create a dataframe / csv that looks like this

App	id	stages	requestCpu	requestMemory
appName	123	dev	1000	1024
appName	123	staging	3200	1024

The dict data looks like this and includes quite a lot of apps, however all the data inside the apps looks the same with the dict layout:

test_data = {&quot;appName&quot;: {&quot;id&quot;: &quot;123&quot;, &quot;stages&quot;: {&quot;dev&quot;: [{&quot;request.cpu&quot;: 1000}, {&quot;request.memory&quot;: 1024}], &quot;staging&quot;: [{&quot;request.cpu&quot;: 3200}, {&quot;request.memory&quot;: 1024}]}}, &quot;appName2&quot;...}

I used something like this before:

df = pd.DataFrame.from_dict(test_data, orient=&#39;index&#39;)
df = pd.concat([df.drop([&#39;stages&#39;], axis=1), (df[&#39;stages&#39;].apply(pd.Series))], axis=1)
df.index.name = &quot;App&quot;

However this wasn't able to split up the list part and also the stages were now in columns so not how i wanted it to look..

Any help much appreciated, thanks

答案1

得分: 0

以下是翻译好的部分：

Easiest solution would be to iterate the rows prior to loading it with pandas:
import pandas as pd
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}, "appName2": {"id": "456", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}
rows = []
for app, app_data in test_data.items():
    for stage, stage_data in app_data["stages"].items():
        row = {
            "App": app,
            "id": app_data["id"],
            "stages": stage
        }
        for metric in stage_data:
            metric_name, metric_value = list(metric.items())[0]
            row[metric_name] = metric_value
        rows.append(row)
df = pd.json_normalize(rows)
# Reorder columns
df = df[["App", "id", "stages", "request.cpu", "request.memory"]]

Output:

	App	id	stages	request.cpu	request.memory
0	appName	123	dev	1000	1024
1	appName	123	staging	3200	1024
2	appName2	456	dev	1000	1024
3	appName2	456	staging	3200	1024


<details>
<summary>英文:</summary>
Easiest solution would be to iterate the rows prior to loading it with pandas:
    import pandas as pd
    
    test_data = {&quot;appName&quot;: {&quot;id&quot;: &quot;123&quot;, &quot;stages&quot;: {&quot;dev&quot;: [{&quot;request.cpu&quot;: 1000}, {&quot;request.memory&quot;: 1024}], &quot;staging&quot;: [{&quot;request.cpu&quot;: 3200}, {&quot;request.memory&quot;: 1024}]}}, &quot;appName2&quot;: {&quot;id&quot;: &quot;456&quot;, &quot;stages&quot;: {&quot;dev&quot;: [{&quot;request.cpu&quot;: 1000}, {&quot;request.memory&quot;: 1024}], &quot;staging&quot;: [{&quot;request.cpu&quot;: 3200}, {&quot;request.memory&quot;: 1024}]}}}
    
    
    rows = []
    
    for app, app_data in test_data.items():
        for stage, stage_data in app_data[&quot;stages&quot;].items():
            row = {
                &quot;App&quot;: app,
                &quot;id&quot;: app_data[&quot;id&quot;],
                &quot;stages&quot;: stage
            }
            for metric in stage_data:
                metric_name, metric_value = list(metric.items())[0]
                row[metric_name] = metric_value
            rows.append(row)
    
    df = pd.json_normalize(rows)
    
    # Reorder columns 
    df = df[[&quot;App&quot;, &quot;id&quot;, &quot;stages&quot;, &quot;request.cpu&quot;, &quot;request.memory&quot;]]
Output:
|    | App      |   id | stages   |   request.cpu |   request.memory |
|---:|:---------|-----:|:---------|--------------:|-----------------:|
|  0 | appName  |  123 | dev      |          1000 |             1024 |
|  1 | appName  |  123 | staging  |          3200 |             1024 |
|  2 | appName2 |  456 | dev      |          1000 |             1024 |
|  3 | appName2 |  456 | staging  |          3200 |             1024 |
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建DataFrame，使用嵌套字典和列表。

问题

答案1

Running a script after creation of instance in GCP Managed Instance Group

使用多进程进行绘图

如何使用zipfile提取一个子目录及其所有后续文件

有没有一些类似的替代方法来同时使用classmethod和property装饰器？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。