英文:
Create tree like data structure in JSON format from Pandas Data frames using python
问题
我有3个不同的数据框,分别表示城市、县和州的地理边界图层。
Df1(多边形列用于城市边界)
城市ID | 城市 | 县ID | 县 | 州ID | 州 | 几何图形
12345 123 12 POLYGON (0.0,…..)
Df 2(多边形列用于县边界)
县ID | 县 | 州ID | 州 | 几何图形
475 47 POLYGON (0.0, …..)
Df 3(多边形列用于州边界)
州ID | 州 | 几何图形
25 POLYGON (0.0, …..)
我想要创建一个JSON格式的树形数据结构,类似以下结构:
州 1(ID、名称、多边形(…))
├── 县 1(ID、名称、多边形(…))
└── 城市 1(ID、名称、多边形(…))
└── 县 2
└── 城市 1
└── 城市 2
└── 城市 3
州 2
├── 县 1
└── 城市 1
└── 县 2
└── 城市 1
等等…
目标是能够搜索任何层次的层次结构并显示相应的多边形,无论是城市、县还是州的多边形。
应该如何进行?如何从pandas数据框开始创建嵌套值和结构化JSON格式。
树状数据结构是否是正确的选择?我应该创建邻接列表吗?如何操作?
非常感谢!
英文:
I have 3 different dataframes that represents layers of geo boundaries of cities, counties, and states respectively.
Df1 (Polygon column is for the city boundaries)
City ID | City | County ID | County | State ID | State | geometry
12345 123 12 POLYGON (0.0,…..)
Df 2 (Polygon column is for the county boundaries)
County ID | County | State ID | State | geometry
475 47 POLYGON (0.0, …..)
Df 3 (Polygon column is for the State boundaries)
State ID | State | geometry
25 POLYGON (0.0, …..)
I want to create a tree like data structure in JSON format that looks like the following :
State 1 (ID, Name, Polygon(…))
├── County 1 (ID, Name, Polygon(…))
└── City 1 (ID, Name, Polygon(…))
└── County 2
└── City 1
└── City 2
└── City 3
State 2
├── County 1
└── City 1
└── County 2
└── City 1
Etc …
The goal is to be able to search any level of hierarchy and display the corresponding polygone , whether its of the city , the county or the state
What is the way to proceed? How to create nested values and structure the JSON format starting from pandas dataframe.
Is tree like data structure is the right one ? Should I create an adjacency list ? How ?
Many thanx
答案1
得分: 0
以下是代码部分的中文翻译:
import pandas as pd
import json
# 示例数据框
df1 = pd.DataFrame({
'城市 ID': [12345],
'城市': ['城市 1'],
'县 ID': [123],
'县': ['县 1'],
'州 ID': [12],
'州': ['州 1'],
'几何图形': ['多边形 (0.0, …)']
})
df2 = pd.DataFrame({
'县 ID': [123],
'县': ['县 1'],
'州 ID': [12],
'州': ['州 1'],
'几何图形': ['多边形 (0.0, …)']
})
df3 = pd.DataFrame({
'州 ID': [12],
'州': ['州 1'],
'几何图形': ['多边形 (0.0, …)']
})
# 步骤 1: 合并数据框
df_merged = df1.merge(df2, on=['县 ID', '州 ID', '州'], how='left').merge(df3, on=['州 ID', '州'], how='left')
# 步骤 2: 构建表示树结构的嵌套字典
tree_dict = {}
for _, row in df_merged.iterrows():
州 ID = row['州 ID']
州名称 = row['州']
县 ID = row['县 ID']
县名称 = row['县']
城市 ID = row['城市 ID']
城市名称 = row['城市']
多边形 = row['几何图形']
# 如果州在树字典中不存在,则添加
if 州 ID not in tree_dict:
tree_dict[州 ID] = {
'ID': 州 ID,
'名称': 州名称,
'多边形': 多边形,
'县': {}
}
# 如果县在树字典中不存在,则添加
if 县 ID not in tree_dict[州 ID]['县']:
tree_dict[州 ID]['县'][县 ID] = {
'ID': 县 ID,
'名称': 县名称,
'多边形': 多边形,
'城市': {}
}
# 添加城市到树字典
tree_dict[州 ID]['县'][县 ID]['城市'][城市 ID] = {
'ID': 城市 ID,
'名称': 城市名称,
'多边形': 多边形
}
# 步骤 3: 将嵌套字典转换为 JSON 格式
json_data = json.dumps(tree_dict, indent=4)
# 打印 JSON 数据
print(json_data)
此代码基于共享的 ID 列(城市 ID、县 ID、州 ID)合并了给定的数据框,以创建一个统一的数据框。然后,它构建了一个表示树形层次结构的嵌套字典结构,从州级别开始一直到城市级别。最后,它使用 json.dumps() 函数将嵌套字典转换为 JSON 格式。
生成的 JSON 结构遵循所需的树状格式,其中州具有嵌套的县,县具有嵌套的城市。每个树的层次包含相关的 ID、名称和多边形信息。
英文:
A tree like data structure can be created in JSON format using following code with Pandas:
import pandas as pd
import json
# Example DataFrames
df1 = pd.DataFrame({
'City ID': [12345],
'City': ['City 1'],
'County ID': [123],
'County': ['County 1'],
'State ID': [12],
'State': ['State 1'],
'geometry': ['POLYGON (0.0, …)']
})
df2 = pd.DataFrame({
'County ID': [123],
'County': ['County 1'],
'State ID': [12],
'State': ['State 1'],
'geometry': ['POLYGON (0.0, …)']
})
df3 = pd.DataFrame({
'State ID': [12],
'State': ['State 1'],
'geometry': ['POLYGON (0.0, …)']
})
# Step 1: Merge DataFrames
df_merged = df1.merge(df2, on=['County ID', 'State ID', 'State'], how='left').merge(df3, on=['State ID', 'State'], how='left')
# Step 2: Build nested dictionary representing the tree structure
tree_dict = {}
for _, row in df_merged.iterrows():
state_id = row['State ID']
state_name = row['State']
county_id = row['County ID']
county_name = row['County']
city_id = row['City ID']
city_name = row['City']
polygon = row['geometry']
# Add state to the tree dictionary if it doesn't exist
if state_id not in tree_dict:
tree_dict[state_id] = {
'ID': state_id,
'Name': state_name,
'Polygon': polygon,
'Counties': {}
}
# Add county to the tree dictionary if it doesn't exist
if county_id not in tree_dict[state_id]['Counties']:
tree_dict[state_id]['Counties'][county_id] = {
'ID': county_id,
'Name': county_name,
'Polygon': polygon,
'Cities': {}
}
# Add city to the tree dictionary
tree_dict[state_id]['Counties'][county_id]['Cities'][city_id] = {
'ID': city_id,
'Name': city_name,
'Polygon': polygon
}
# Step 3: Convert nested dictionary to JSON format
json_data = json.dumps(tree_dict, indent=4)
# Print the JSON data
print(json_data)
This code merges the given DataFrames based on the shared ID columns (City ID, County ID, State ID) to create a unified DataFrame. It then builds a nested dictionary structure representing the tree hierarchy, starting from the state level down to the city level. Finally, it converts the nested dictionary into JSON format using the json.dumps() function.
The resulting JSON structure follows the desired tree-like format, where states have nested counties, and counties have nested cities. Each level of the tree contains the relevant ID, Name, and Polygon information.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论