2023年2月23日 22:46:13go评论100阅读模式

英文:

Normalizing a Nested JSON in Python and Converting it to a Pandas Dataframe

问题

我已经创建了一个更简化的一些JSON数据版本，我一直在处理这里：

[
	{
		"id": 1,
		"city": "Philadelphia",
		"Retaillocations": { "subLocation": [
		{
			"address": "1235 Passyunk Ave",
			"district": "South"
		},
		{
			"address": "900 Market St",
			"district": "Center City"
		},
		{
			"address": "2300 Roosevelt Blvd",
			"district": "North"
		}
		]
	},
		"distributionLocations": {"subLocation": [{
			"address": "3000 Broad St",
			"district": "North"
		},
		{
			"address": "3000 Essington Blvd",
			"district": "Cargo City"
		},
		{
			"address": "4300 City Ave",
			"district": "West"
		}
		]
	}
		
	}
]

我的目标是将其规范化为一个数据帧（是的，上面的JSON只会创建一行，但我希望掌握步骤，然后将其概括为一个更大的集合）。

首先，我使用jsob_obj = json.loads(inputData)加载文件，将其转换为字典。问题是，其中一些字典可能包含列表，并且嵌套得很奇怪，如上所示。我尝试使用pd.json_normalize(json_obj, record_path='retailLocations')，但会出现类型错误，提示列表索引必须是整数或切片，而不是字符串。如何处理上面的JSON文件并将其转换为pandas数据帧中的单个记录？

英文:

I have created a simpler version of some JSON data I've been working with here:

[
	{
		&quot;id&quot;: 1,
		&quot;city&quot;: &quot;Philadelphia&quot;,
		&quot;Retaillocations&quot;: { &quot;subLocation&quot;: [
		{
			&quot;address&quot;: &quot;1235 Passyunk Ave&quot;,
			&quot;district&quot;: &quot;South&quot;
		},
		{
			&quot;address&quot;: &quot;900 Market St&quot;,
			&quot;district&quot;: &quot;Center City&quot;
		},
		{
			&quot;address&quot;: &quot;2300 Roosevelt Blvd&quot;,
			&quot;district&quot;: &quot;North&quot;
		}
		]
	},
		&quot;distributionLocations&quot;: {&quot;subLocation&quot;: [{
			&quot;address&quot;: &quot;3000 Broad St&quot;,
			&quot;district&quot;: &quot;North&quot;
		},
		{
			&quot;address&quot;: &quot;3000 Essington Blvd&quot;,
			&quot;district&quot;: &quot;Cargo City&quot;
		},
		{
			&quot;address&quot;: &quot;4300 City Ave&quot;,
			&quot;district&quot;: &quot;West&quot;
		}
		]
	}
		
	}
]

My goal is to normalize this into a data frame (yes, the above json will only create one row, but I am hoping to get the steps down and then generalize it to a larger set).

First, I loaded the file with jsob_obj = json.loads(inputData) which turns this into a dictionary. The problem is that some of the dictionaries can have lists and are nested oddly as shown above. I've tried using pd.json_normalize(json_obj, record_path = 'retailLocations'), I get a type error saying that list indices must be integers or slices, not str. How can I handle the above JSON file and convert it into a single record in a pandas data frame?

答案1

得分: 1

使用.json_normalize()函数来展平数据，猜测所需的输出：

retail = pd.json_normalize(
    data=jsob_obj,
    meta=["id", "city"],
    record_path=["Retaillocations", "subLocation"]
).assign(source="retail")
distribution = pd.json_normalize(
    data=jsob_obj,
    meta=["id", "city"],
    record_path=["distributionLocations", "subLocation"]
).assign(source="distribution")
final = pd.concat([retail, distribution]).reset_index(drop=True)
print(final)

输出：

                   address     district id          city        source
0    1235 Passyunk Ave        South  1  Philadelphia        retail
1        900 Market St  Center City  1  Philadelphia        retail
2  2300 Roosevelt Blvd        North  1  Philadelphia        retail
3        3000 Broad St        North  1  Philadelphia  distribution
4  3000 Essington Blvd   Cargo City  1  Philadelphia  distribution
5        4300 City Ave         West  1  Philadelphia  distribution

英文:

Guessing on the desired output, using .json_normalize() to flatten:

retail = pd.json_normalize(
    data=jsob_obj,
    meta=[&quot;id&quot;, &quot;city&quot;],
    record_path=[&quot;Retaillocations&quot;, &quot;subLocation&quot;]
).assign(source=&quot;retail&quot;)
distribution = pd.json_normalize(
    data=jsob_obj,
    meta=[&quot;id&quot;, &quot;city&quot;],
    record_path=[&quot;distributionLocations&quot;, &quot;subLocation&quot;]
).assign(source=&quot;distribution&quot;)
final = pd.concat([retail, distribution]).reset_index(drop=True)
print(final)

Output:

               address     district id          city        source
0    1235 Passyunk Ave        South  1  Philadelphia        retail
1        900 Market St  Center City  1  Philadelphia        retail
2  2300 Roosevelt Blvd        North  1  Philadelphia        retail
3        3000 Broad St        North  1  Philadelphia  distribution
4  3000 Essington Blvd   Cargo City  1  Philadelphia  distribution
5        4300 City Ave         West  1  Philadelphia  distribution

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python中规范化嵌套的JSON并将其转换为Pandas数据框。

问题

答案1

如何使用Python 3从UniProt下载一个fasta文件中的多个序列

pdfplumber表格提取不一致的列和去除空格

如何从po文件中删除旧的（被注释掉的）msgid？

Java无法运行较大的Python文件。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。