2023年7月6日 18:24:58go评论93阅读模式

英文:

Store dataframe into json fromat with group by

问题

我正在寻找的输出如下所示：

id	type
1	{"test":{"spark":2,"kafka":1},"test1":{"spark":2}}
2	{"test2":{"kafka":1}}
3	{"test":{"spark":2}}

英文:

I have DF in below format

id	type	application	number
1	test	spark	2
1	test	kafka	1
1	test1	spark	2
2	test2	kafka	1
2	test2	kafka	1
3	test	spark	2

o/p I am looking for is

id	type
1	{"test":{"spark":2,"kafka":1},"test1":{"spark":2}}
2	{"test2":{"kafka":1}}
3	{"test":{"spark":2}}

I have tried several approaches but nothing returned me expected format

答案1

得分: 1

以下是翻译好的部分：

如果您的输入数据框不太大，您可以按照以下代码依次循环每个唯一的id，然后类型，然后应用程序：
import pandas as pd
df = pd.DataFrame(data=[
    [1, "test", "spark", 2],
    [1, "test", "kafka", 1],
    [1, "test1", "spark", 2],
    [2, "test2", "kafka", 1],
    [2, "test2", "kafka", 1],
    [3, "test", "spark", 2],
], columns=['id', 'type', 'application', 'number'])
data = []
for id in df['id'].unique():
    id = int(id)
    subdf1 = df[df['id']==id]
    row = {"id": id, "type": {}}
    for type in subdf1['type'].unique():
        subdf2 = subdf1[subdf1['type']==type]
        row["type"][type] = {}
        for application in subdf2['application'].unique():
            subdf3 = subdf2[subdf2['application']==application]
            # 取第一行的“number”
            # 在此之前，请确保所有具有相同id、类型、应用程序的输入行具有相同的“number”
            first_row = subdf3.iloc[0] 
            number = int(first_row['number']) 
            row["type"][type][application] = number
    data.append(row)
out = pd.DataFrame(data)
print(out)

输出：

   id                                               type
0   1  {'test': {'spark': 2, 'kafka': 1}, 'test1': {'...
1   2                            {'test2': {'kafka': 1}}
2   3                             {'test': {'spark': 2}}

英文:

If your input dataframe is not too large, you can loop every unique id, then type, then application in turn as following code:


import pandas as pd
df = pd.DataFrame(data=[
    [1,	&quot;test&quot;, &quot;spark&quot;, 2],
    [1,	&quot;test&quot;, &quot;kafka&quot;, 1],
    [1,	&quot;test1&quot;, &quot;spark&quot;, 2],
    [2,	&quot;test2&quot;, &quot;kafka&quot;, 1],
    [2,	&quot;test2&quot;, &quot;kafka&quot;, 1],
    [3,	&quot;test&quot;, &quot;spark&quot;, 2],
], columns=[&#39;id&#39;, &#39;type&#39;, &#39;application&#39;, &#39;number&#39;])
data = []
for id in df[&#39;id&#39;].unique():
    id = int(id)
    subdf1 = df[df[&#39;id&#39;]==id]
    row = {&quot;id&quot;: id, &quot;type&quot;: {}}
    for type in subdf1[&#39;type&#39;].unique():
        subdf2 = subdf1[subdf1[&#39;type&#39;]==type]
        row[&quot;type&quot;][type] = {}
        for application in subdf2[&#39;application&#39;].unique():
            subdf3 = subdf2[subdf2[&#39;application&#39;]==application]
            # Take the &quot;number&quot; of first row
            # before that, make sure all input rows with same id,type,application have same &quot;number&quot;, 
            first_row = subdf3.iloc[0] 
            number = int(first_row[&#39;number&#39;]) 
            row[&quot;type&quot;][type][application] = number
    data.append(row)
out = pd.DataFrame(data)
print(out)

Output:

   id                                               type
0   1  {&#39;test&#39;: {&#39;spark&#39;: 2, &#39;kafka&#39;: 1}, &#39;test1&#39;: {&#39;...
1   2                            {&#39;test2&#39;: {&#39;kafka&#39;: 1}}
2   3                             {&#39;test&#39;: {&#39;spark&#39;: 2}}

答案2

得分: 0

I was not able to get exact what I was looking for but able to reach nearby solution

labelled_df = labelled_df[['id', 'type', 'application', 'number']].drop_duplicates()
group_df = labelled_df.groupby(['id', 'type'])[['application', 'number']].apply(
    lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(name='group1').groupby('id')[
    'type', 'group1'].apply(lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(
    name='applications')

which resulted in

  asset_id  applications
0        1  [{'test': [{'spark': 2}, {'kafka': 1}]}, {'test1': [{'spark': 2}]}]
1        2  [{'test2': [{'kafka': 1}]}]
2        3  [{'test': [{'spark': 2}]}]

英文:

I was not able to get exact what I was looking for but able to reach nearby solution

labelled_df = labelled_df[[&quot;id&quot;, &quot;type&quot;, &quot;application&quot;, &quot;number&quot;]].drop_duplicates()
group_df = labelled_df.groupby([&#39;id&#39;, &#39;type&#39;])[[&#39;application&#39;, &#39;number&#39;]].apply(
        lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(name=&#39;group1&#39;).groupby(&#39;id&#39;)[
        &#39;type&#39;, &#39;group1&#39;].apply(lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(
        name=&#39;applications&#39;)

which resulted in

  asset_id  applications
0        1  [{&#39;test&#39;: [{&#39;spark&#39;: 2}, {&#39;kafka&#39;: 1}]}, {&#39;test1&#39;: [{&#39;spark&#39;: 2}]}]
1        2  [{&#39;test2&#39;: [{&#39;kafka&#39;: 1}]}]
2        3  [{&#39;test&#39;: [{&#39;spark&#39;: 2}]}]
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将数据框按组存储为JSON格式

问题

答案1

答案2

pip install ppaquette-gym-doom

Python Postgres Connections with Green Threads

web2py: 左外连接未返回左侧的所有记录

Python解释器在VS Code中显示的不应存在。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。