2023年8月5日 04:37:58go评论187阅读模式

英文:

Create normalized dataframe from a nested json

问题

Sure, here's the translated code:

with open("employee.json") as file:
    data = json.load(file)  
data_df = pd.json_normalize(data, 'addresses', ['rec_id', 'timestamp', 'edited_timestamp', 'user.id', 'user.name'])
data_df.columns = ['rec_id', 'addresses.address_type', 'addresses.street1', 'addresses.street2', 'addresses.city', 'timestamp', 'edited_timestamp', 'user.id', 'user.name']
display(data_df)

英文:

I am trying to create a dataframe from nested json file but running into trouble.

[
	{
		&quot;rec_id&quot;: &quot;1&quot;,
		&quot;user&quot;: {
			&quot;id&quot;: &quot;12414&quot;,
			&quot;name&quot;: &quot;Steve&quot;
		},
		&quot;addresses&quot;: [
			{
				&quot;address_type&quot;: &quot;Home&quot;,
				&quot;street1&quot;: &quot;100 Main St&quot;,
				&quot;street2&quot;: null,
				&quot;city&quot;: &quot;Chicago&quot;
			},
			{
				&quot;address_type&quot;: &quot;Work&quot;,
				&quot;street1&quot;: &quot;100 Main St&quot;,
				&quot;street2&quot;: null,
				&quot;city&quot;: &quot;Chicago&quot;
			}
		],
		&quot;timestamp&quot;: &quot;2023-07-28T20:05:14.859000+00:00&quot;,
		&quot;edited_timestamp&quot;: null
	},
	{
		&quot;rec_id&quot;: &quot;2&quot;,
		&quot;user&quot;: {
			&quot;id&quot;: &quot;214521&quot;,
			&quot;name&quot;: &quot;Tim&quot;
		},
		&quot;addresses&quot;: [
			{
				&quot;address_type&quot;: &quot;Home&quot;,
				&quot;street1&quot;: &quot;100 Main St&quot;,
				&quot;street2&quot;: null,
				&quot;city&quot;: &quot;Boston&quot;
			},
			{
				&quot;address_type&quot;: &quot;Work&quot;,
				&quot;street1&quot;: &quot;100 Main St&quot;,
				&quot;street2&quot;: null,
				&quot;city&quot;: &quot;Boston&quot;
			}
		],
		&quot;timestamp&quot;: &quot;2023-07-28T20:05:14.859000+00:00&quot;,
		&quot;edited_timestamp&quot;: null
	},
	{
		&quot;rec_id&quot;: &quot;3&quot;,
		&quot;user&quot;: {
			&quot;id&quot;: &quot;12121&quot;,
			&quot;name&quot;: &quot;Jack&quot;
		},
		&quot;addresses&quot;: [
			{
				&quot;address_type&quot;: &quot;Home&quot;,
				&quot;street1&quot;: &quot;100 Main St&quot;,
				&quot;street2&quot;: null,
				&quot;city&quot;: &quot;Las Vegas&quot;
			} ]
		&quot;timestamp&quot;: &quot;2023-07-28T20:05:14.859000+00:00&quot;,
		&quot;edited_timestamp&quot;: null
	}
]

I tried below:

with open(&quot;employee.json&quot;) as file:
    data = json.load(file)  
data_df = pd.json_normalize(data)
data_df.columns.values.tolist()
[&#39;rec_id&#39;,
 &#39;addresses&#39;,
 &#39;timestamp&#39;,
 &#39;edited_timestamp&#39;,
 &#39;user.id&#39;,
 &#39;user.name&#39;]
display(data_df)

	rec_id	addresses	timestamp	edited_timestamp	user.id	user.name
0	1	[{&#39;address_type&#39;: &#39;Home&#39;, &#39;street1&#39;: &#39;100 Main St&#39;, &#39;street2&#39;: None, &#39;city&#39;: &#39;Chicago&#39;}, {&#39;address_type&#39;: &#39;Work&#39;, &#39;street1&#39;: &#39;100 Main St&#39;, &#39;street2&#39;: None, &#39;city&#39;: &#39;Chicago&#39;}]	2023-07-28T20:05:14.859000+00:00	None	12414	Steve
1	2	[{&#39;address_type&#39;: &#39;Home&#39;, &#39;street1&#39;: &#39;100 Main St&#39;, &#39;street2&#39;: None, &#39;city&#39;: &#39;Boston&#39;}, {&#39;address_type&#39;: &#39;Work&#39;, &#39;street1&#39;: &#39;100 Main St&#39;, &#39;street2&#39;: None, &#39;city&#39;: &#39;Boston&#39;}]	2023-07-28T20:05:14.859000+00:00	None	214521	Tim
2	3	[{&#39;address_type&#39;: &#39;Home&#39;, &#39;street1&#39;: &#39;100 Main St&#39;, &#39;street2&#39;: None, &#39;city&#39;: &#39;Las Vegas&#39;}]	2023-07-28T20:05:14.859000+00:00	None	12121	Jack

How do I get the output as below -

rec_id	addresses.address_type	addresses.street1	addresses.street2	addresses.city	timestamp	edited_timestamp	user.id	user.name
0	1	Home	100 Main St	None	Chicago	2023-07-28T20:05:14.859000+00:00	None	12414	Steve
1 	1	Work	100 Main St	None	Chicago	2023-07-28T20:05:14.859000+00:00	None	12414	Steve
2	2	Home	100 Main St	None	Boston	2023-07-28T20:05:14.859000+00:00	None	214521	Tim
3	2	Work	100 Main St	None	Boston	2023-07-28T20:05:14.859000+00:00	None	214521	Tim
4	3	Home	100 Main St	None	Las Vegas	2023-07-28T20:05:14.859000+00:00	None	12121	Jack

答案1

得分: 1

这是您要翻译的内容：

尝试这个，更多关于 .json_normalize

df = pd.json_normalize(data, 
                       record_path='addresses', 
                       meta=['rec_id', ["user", "id"], ["user", "name"], 'timestamp', 'edited_timestamp'])

输出：

	address_type	street1	city	rec_id	user.id	user.name	timestamp
0	Home	100 Main St	Chicago	1	12414	Steve	2023-07-28T20:05:14.859000+00:00
1	Work	100 Main St	Chicago	1	12414	Steve	2023-07-28T20:05:14.859000+00:00
2	Home	100 Main St	Boston	2	214521	Tim	2023-07-28T20:05:14.859000+00:00
3	Work	100 Main St	Boston	2	214521	Tim	2023-07-28T20:05:14.859000+00:00
4	Home	100 Main St	Las Vegas	3	12121	Jack	2023-07-28T20:05:14.859000+00:00

英文:

Try this, more about .json_normalize

df = pd.json_normalize(data, 
record_path=&#39;addresses&#39;, 
meta=[&#39;rec_id&#39;, [&quot;user&quot;, &quot;id&quot;], [&quot;user&quot;, &quot;name&quot;], &#39;timestamp&#39;, &#39;edited_timestamp&#39;])

Output:

	address_type	street1	city	rec_id	user.id	user.name	timestamp
0	Home	100 Main St	Chicago	1	12414	Steve	2023-07-28T20:05:14.859000+00:00
1	Work	100 Main St	Chicago	1	12414	Steve	2023-07-28T20:05:14.859000+00:00
2	Home	100 Main St	Boston	2	214521	Tim	2023-07-28T20:05:14.859000+00:00
3	Work	100 Main St	Boston	2	214521	Tim	2023-07-28T20:05:14.859000+00:00
4	Home	100 Main St	Las Vegas	3	12121	Jack	2023-07-28T20:05:14.859000+00:00

答案2

得分: 0

尝试：

data_df = pd.json_normalize(
    data,
    meta=["rec_id", "timestamp", "edited_timestamp", ["user", "id"], ["user", "name"]],
    record_path=["addresses"],
    record_prefix="addresses."
)
print(data_df)

打印：

  addresses.address_type addresses.street1 addresses.street2 addresses.city rec_id                         timestamp edited_timestamp user.id user.name
0                   Home       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
1                   Work       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
2                   Home       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
3                   Work       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
4                   Home       100 Main St              None      Las Vegas      3  2023-07-28T20:05:14.859000+00:00             None   12121      Jack

英文:

Try:

data_df = pd.json_normalize(
    data,
    meta=[&quot;rec_id&quot;, &quot;timestamp&quot;, &quot;edited_timestamp&quot;, [&quot;user&quot;, &quot;id&quot;], [&quot;user&quot;, &quot;name&quot;]],
    record_path=[&quot;addresses&quot;],
    record_prefix=&quot;addresses.&quot;,
)
print(data_df)

Prints:

  addresses.address_type addresses.street1 addresses.street2 addresses.city rec_id                         timestamp edited_timestamp user.id user.name
0                   Home       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
1                   Work       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
2                   Home       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
3                   Work       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
4                   Home       100 Main St              None      Las Vegas      3  2023-07-28T20:05:14.859000+00:00             None   12121      Jack

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从嵌套的JSON创建规范化的数据框。

问题

答案1

答案2

从字符串数组中提取包含子字符串的字符串（Python）

Problem with spacy.load(“en_core_web_md”) in Python.

如何在使用 Docker 的 Vertex AI 中安装和使用 Python 3.9+？

获取TXT文件中的数字如何操作？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。