英文:
Normalizing a Nested JSON in Python and Converting it to a Pandas Dataframe
问题
我已经创建了一个更简化的一些JSON数据版本,我一直在处理这里:
[
{
"id": 1,
"city": "Philadelphia",
"Retaillocations": { "subLocation": [
{
"address": "1235 Passyunk Ave",
"district": "South"
},
{
"address": "900 Market St",
"district": "Center City"
},
{
"address": "2300 Roosevelt Blvd",
"district": "North"
}
]
},
"distributionLocations": {"subLocation": [{
"address": "3000 Broad St",
"district": "North"
},
{
"address": "3000 Essington Blvd",
"district": "Cargo City"
},
{
"address": "4300 City Ave",
"district": "West"
}
]
}
}
]
我的目标是将其规范化为一个数据帧(是的,上面的JSON只会创建一行,但我希望掌握步骤,然后将其概括为一个更大的集合)。
首先,我使用jsob_obj = json.loads(inputData)
加载文件,将其转换为字典。问题是,其中一些字典可能包含列表,并且嵌套得很奇怪,如上所示。我尝试使用pd.json_normalize(json_obj, record_path='retailLocations')
,但会出现类型错误,提示列表索引必须是整数或切片,而不是字符串。如何处理上面的JSON文件并将其转换为pandas数据帧中的单个记录?
英文:
I have created a simpler version of some JSON data I've been working with here:
[
{
"id": 1,
"city": "Philadelphia",
"Retaillocations": { "subLocation": [
{
"address": "1235 Passyunk Ave",
"district": "South"
},
{
"address": "900 Market St",
"district": "Center City"
},
{
"address": "2300 Roosevelt Blvd",
"district": "North"
}
]
},
"distributionLocations": {"subLocation": [{
"address": "3000 Broad St",
"district": "North"
},
{
"address": "3000 Essington Blvd",
"district": "Cargo City"
},
{
"address": "4300 City Ave",
"district": "West"
}
]
}
}
]
My goal is to normalize this into a data frame (yes, the above json will only create one row, but I am hoping to get the steps down and then generalize it to a larger set).
First, I loaded the file with jsob_obj = json.loads(inputData)
which turns this into a dictionary. The problem is that some of the dictionaries can have lists and are nested oddly as shown above. I've tried using pd.json_normalize(json_obj, record_path = 'retailLocations')
, I get a type error saying that list indices must be integers or slices, not str. How can I handle the above JSON file and convert it into a single record in a pandas data frame?
答案1
得分: 1
使用.json_normalize()
函数来展平数据,猜测所需的输出:
retail = pd.json_normalize(
data=jsob_obj,
meta=["id", "city"],
record_path=["Retaillocations", "subLocation"]
).assign(source="retail")
distribution = pd.json_normalize(
data=jsob_obj,
meta=["id", "city"],
record_path=["distributionLocations", "subLocation"]
).assign(source="distribution")
final = pd.concat([retail, distribution]).reset_index(drop=True)
print(final)
输出:
address district id city source
0 1235 Passyunk Ave South 1 Philadelphia retail
1 900 Market St Center City 1 Philadelphia retail
2 2300 Roosevelt Blvd North 1 Philadelphia retail
3 3000 Broad St North 1 Philadelphia distribution
4 3000 Essington Blvd Cargo City 1 Philadelphia distribution
5 4300 City Ave West 1 Philadelphia distribution
英文:
Guessing on the desired output, using .json_normalize() to flatten:
retail = pd.json_normalize(
data=jsob_obj,
meta=["id", "city"],
record_path=["Retaillocations", "subLocation"]
).assign(source="retail")
distribution = pd.json_normalize(
data=jsob_obj,
meta=["id", "city"],
record_path=["distributionLocations", "subLocation"]
).assign(source="distribution")
final = pd.concat([retail, distribution]).reset_index(drop=True)
print(final)
Output:
address district id city source
0 1235 Passyunk Ave South 1 Philadelphia retail
1 900 Market St Center City 1 Philadelphia retail
2 2300 Roosevelt Blvd North 1 Philadelphia retail
3 3000 Broad St North 1 Philadelphia distribution
4 3000 Essington Blvd Cargo City 1 Philadelphia distribution
5 4300 City Ave West 1 Philadelphia distribution
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论