英文:
Store dataframe into json fromat with group by
问题
我正在寻找的输出如下所示:
id | type |
---|---|
1 | {"test":{"spark":2,"kafka":1},"test1":{"spark":2}} |
2 | {"test2":{"kafka":1}} |
3 | {"test":{"spark":2}} |
英文:
I have DF in below format
id | type | application | number |
---|---|---|---|
1 | test | spark | 2 |
1 | test | kafka | 1 |
1 | test1 | spark | 2 |
2 | test2 | kafka | 1 |
2 | test2 | kafka | 1 |
3 | test | spark | 2 |
o/p I am looking for is
id | type |
---|---|
1 | {"test":{"spark":2,"kafka":1},"test1":{"spark":2}} |
2 | {"test2":{"kafka":1}} |
3 | {"test":{"spark":2}} |
I have tried several approaches but nothing returned me expected format
答案1
得分: 1
以下是翻译好的部分:
如果您的输入数据框不太大,您可以按照以下代码依次循环每个唯一的id,然后类型,然后应用程序:
import pandas as pd
df = pd.DataFrame(data=[
[1, "test", "spark", 2],
[1, "test", "kafka", 1],
[1, "test1", "spark", 2],
[2, "test2", "kafka", 1],
[2, "test2", "kafka", 1],
[3, "test", "spark", 2],
], columns=['id', 'type', 'application', 'number'])
data = []
for id in df['id'].unique():
id = int(id)
subdf1 = df[df['id']==id]
row = {"id": id, "type": {}}
for type in subdf1['type'].unique():
subdf2 = subdf1[subdf1['type']==type]
row["type"][type] = {}
for application in subdf2['application'].unique():
subdf3 = subdf2[subdf2['application']==application]
# 取第一行的“number”
# 在此之前,请确保所有具有相同id、类型、应用程序的输入行具有相同的“number”
first_row = subdf3.iloc[0]
number = int(first_row['number'])
row["type"][type][application] = number
data.append(row)
out = pd.DataFrame(data)
print(out)
输出:
id type
0 1 {'test': {'spark': 2, 'kafka': 1}, 'test1': {'...
1 2 {'test2': {'kafka': 1}}
2 3 {'test': {'spark': 2}}
英文:
If your input dataframe is not too large, you can loop every unique id, then type, then application in turn as following code:
import pandas as pd
df = pd.DataFrame(data=[
[1, "test", "spark", 2],
[1, "test", "kafka", 1],
[1, "test1", "spark", 2],
[2, "test2", "kafka", 1],
[2, "test2", "kafka", 1],
[3, "test", "spark", 2],
], columns=['id', 'type', 'application', 'number'])
data = []
for id in df['id'].unique():
id = int(id)
subdf1 = df[df['id']==id]
row = {"id": id, "type": {}}
for type in subdf1['type'].unique():
subdf2 = subdf1[subdf1['type']==type]
row["type"][type] = {}
for application in subdf2['application'].unique():
subdf3 = subdf2[subdf2['application']==application]
# Take the "number" of first row
# before that, make sure all input rows with same id,type,application have same "number",
first_row = subdf3.iloc[0]
number = int(first_row['number'])
row["type"][type][application] = number
data.append(row)
out = pd.DataFrame(data)
print(out)
Output:
id type
0 1 {'test': {'spark': 2, 'kafka': 1}, 'test1': {'...
1 2 {'test2': {'kafka': 1}}
2 3 {'test': {'spark': 2}}
答案2
得分: 0
I was not able to get exact what I was looking for but able to reach nearby solution
labelled_df = labelled_df[['id', 'type', 'application', 'number']].drop_duplicates()
group_df = labelled_df.groupby(['id', 'type'])[['application', 'number']].apply(
lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(name='group1').groupby('id')[
'type', 'group1'].apply(lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(
name='applications')
which resulted in
asset_id applications
0 1 [{'test': [{'spark': 2}, {'kafka': 1}]}, {'test1': [{'spark': 2}]}]
1 2 [{'test2': [{'kafka': 1}]}]
2 3 [{'test': [{'spark': 2}]}]
英文:
I was not able to get exact what I was looking for but able to reach nearby solution
labelled_df = labelled_df[["id", "type", "application", "number"]].drop_duplicates()
group_df = labelled_df.groupby(['id', 'type'])[['application', 'number']].apply(
lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(name='group1').groupby('id')[
'type', 'group1'].apply(lambda g: [{row[0]: row[1]} for row in g.values.tolist()]).reset_index(
name='applications')
which resulted in
asset_id applications
0 1 [{'test': [{'spark': 2}, {'kafka': 1}]}, {'test1': [{'spark': 2}]}]
1 2 [{'test2': [{'kafka': 1}]}]
2 3 [{'test': [{'spark': 2}]}]
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论