将数据框按组存储为JSON格式

huangapple go评论64阅读模式
英文:

Store dataframe into json fromat with group by

问题

我正在寻找的输出如下所示:

id type
1 {"test":{"spark":2,"kafka":1},"test1":{"spark":2}}
2 {"test2":{"kafka":1}}
3 {"test":{"spark":2}}
英文:

I have DF in below format

id type application number
1 test spark 2
1 test kafka 1
1 test1 spark 2
2 test2 kafka 1
2 test2 kafka 1
3 test spark 2

o/p I am looking for is

id type
1 {"test":{"spark":2,"kafka":1},"test1":{"spark":2}}
2 {"test2":{"kafka":1}}
3 {"test":{"spark":2}}

I have tried several approaches but nothing returned me expected format

答案1

得分: 1

以下是翻译好的部分:

如果您的输入数据框不太大您可以按照以下代码依次循环每个唯一的id然后类型然后应用程序

import pandas as pd

df = pd.DataFrame(data=[
    [1, "test", "spark", 2],
    [1, "test", "kafka", 1],
    [1, "test1", "spark", 2],
    [2, "test2", "kafka", 1],
    [2, "test2", "kafka", 1],
    [3, "test", "spark", 2],
], columns=['id', 'type', 'application', 'number'])

data = []
for id in df['id'].unique():
    id = int(id)
    subdf1 = df[df['id']==id]
    row = {"id": id, "type": {}}
    for type in subdf1['type'].unique():
        subdf2 = subdf1[subdf1['type']==type]
        row["type"][type] = {}
        for application in subdf2['application'].unique():
            subdf3 = subdf2[subdf2['application']==application]
            # 取第一行的“number”
            # 在此之前,请确保所有具有相同id、类型、应用程序的输入行具有相同的“number”
            first_row = subdf3.iloc[0] 
            number = int(first_row['number']) 
            row["type"][type][application] = number
    data.append(row)

out = pd.DataFrame(data)
print(out)

输出:

   id                                               type
0   1  {'test': {'spark': 2, 'kafka': 1}, 'test1': {'...
1   2                            {'test2': {'kafka': 1}}
2   3                             {'test': {'spark': 2}}
英文:

If your input dataframe is not too large, you can loop every unique id, then type, then application in turn as following code:


import pandas as pd

df = pd.DataFrame(data=[
    [1,	"test", "spark", 2],
    [1,	"test", "kafka", 1],
    [1,	"test1", "spark", 2],
    [2,	"test2", "kafka", 1],
    [2,	"test2", "kafka", 1],
    [3,	"test", "spark", 2],
], columns=['id', 'type', 'application', 'number'])

data = []
for id in df['id'].unique():
    id = int(id)
    subdf1 = df[df['id']==id]
    row = {"id": id, "type": {}}
    for type in subdf1['type'].unique():
        subdf2 = subdf1[subdf1['type']==type]
        row["type"][type] = {}
        for application in subdf2['application'].unique():
            subdf3 = subdf2[subdf2['application']==application]
            # Take the "number" of first row
            # before that, make sure all input rows with same id,type,application have same "number", 
            first_row = subdf3.iloc[0] 
            number = int(first_row['number']) 
            row["type"][type][application] = number
    data.append(row)

out = pd.DataFrame(data)
print(out)

Output:

   id                                               type
0   1  {'test': {'spark': 2, 'kafka': 1}, 'test1': {'...
1   2                            {'test2': {'kafka': 1}}
2   3                             {'test': {'spark': 2}}

答案2

得分: 0

I was not able to get exact what I was looking for but able to reach nearby solution

labelled_df = labelled_df[['id', 'type', 'application', 'number']].drop_duplicates()

group_df = labelled_df.groupby(['id', 'type'])[['application', 'number']].apply(
    lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(name='group1').groupby('id')[
    'type', 'group1'].apply(lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(
    name='applications')

which resulted in

  asset_id  applications
0        1  [{'test': [{'spark': 2}, {'kafka': 1}]}, {'test1': [{'spark': 2}]}]
1        2  [{'test2': [{'kafka': 1}]}]
2        3  [{'test': [{'spark': 2}]}]
英文:

I was not able to get exact what I was looking for but able to reach nearby solution

labelled_df = labelled_df[["id", "type", "application", "number"]].drop_duplicates()

group_df = labelled_df.groupby(['id', 'type'])[['application', 'number']].apply(
        lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(name='group1').groupby('id')[
        'type', 'group1'].apply(lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(
        name='applications')

which resulted in

  asset_id  applications
0        1  [{'test': [{'spark': 2}, {'kafka': 1}]}, {'test1': [{'spark': 2}]}]
1        2  [{'test2': [{'kafka': 1}]}]
2        3  [{'test': [{'spark': 2}]}]

</details>



huangapple
  • 本文由 发表于 2023年7月6日 18:24:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76627849.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定