从嵌套的JSON创建规范化的数据框。

huangapple go评论131阅读模式
英文:

Create normalized dataframe from a nested json

问题

Sure, here's the translated code:

with open("employee.json") as file:
    data = json.load(file)  

data_df = pd.json_normalize(data, 'addresses', ['rec_id', 'timestamp', 'edited_timestamp', 'user.id', 'user.name'])
data_df.columns = ['rec_id', 'addresses.address_type', 'addresses.street1', 'addresses.street2', 'addresses.city', 'timestamp', 'edited_timestamp', 'user.id', 'user.name']

display(data_df)
英文:

I am trying to create a dataframe from nested json file but running into trouble.

[
	{
		"rec_id": "1",
		"user": {
			"id": "12414",
			"name": "Steve"
		},
		"addresses": [
			{
				"address_type": "Home",
				"street1": "100 Main St",
				"street2": null,
				"city": "Chicago"
			},
			{
				"address_type": "Work",
				"street1": "100 Main St",
				"street2": null,
				"city": "Chicago"
			}
		],
		"timestamp": "2023-07-28T20:05:14.859000+00:00",
		"edited_timestamp": null
	},
	{
		"rec_id": "2",
		"user": {
			"id": "214521",
			"name": "Tim"
		},
		"addresses": [
			{
				"address_type": "Home",
				"street1": "100 Main St",
				"street2": null,
				"city": "Boston"
			},
			{
				"address_type": "Work",
				"street1": "100 Main St",
				"street2": null,
				"city": "Boston"
			}
		],
		"timestamp": "2023-07-28T20:05:14.859000+00:00",
		"edited_timestamp": null
	},
	{
		"rec_id": "3",
		"user": {
			"id": "12121",
			"name": "Jack"
		},
		"addresses": [
			{
				"address_type": "Home",
				"street1": "100 Main St",
				"street2": null,
				"city": "Las Vegas"
			} ]
		"timestamp": "2023-07-28T20:05:14.859000+00:00",
		"edited_timestamp": null
	}
]

I tried below:

with open("employee.json") as file:
    data = json.load(file)  

data_df = pd.json_normalize(data)

data_df.columns.values.tolist()

['rec_id',
 'addresses',
 'timestamp',
 'edited_timestamp',
 'user.id',
 'user.name']

display(data_df)
	rec_id	addresses	timestamp	edited_timestamp	user.id	user.name
0	1	[{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Chicago'}, {'address_type': 'Work', 'street1': '100 Main St', 'street2': None, 'city': 'Chicago'}]	2023-07-28T20:05:14.859000+00:00	None	12414	Steve
1	2	[{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Boston'}, {'address_type': 'Work', 'street1': '100 Main St', 'street2': None, 'city': 'Boston'}]	2023-07-28T20:05:14.859000+00:00	None	214521	Tim
2	3	[{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Las Vegas'}]	2023-07-28T20:05:14.859000+00:00	None	12121	Jack

How do I get the output as below -

rec_id	addresses.address_type	addresses.street1	addresses.street2	addresses.city	timestamp	edited_timestamp	user.id	user.name
0	1	Home	100 Main St	None	Chicago	2023-07-28T20:05:14.859000+00:00	None	12414	Steve
1 	1	Work	100 Main St	None	Chicago	2023-07-28T20:05:14.859000+00:00	None	12414	Steve
2	2	Home	100 Main St	None	Boston	2023-07-28T20:05:14.859000+00:00	None	214521	Tim
3	2	Work	100 Main St	None	Boston	2023-07-28T20:05:14.859000+00:00	None	214521	Tim
4	3	Home	100 Main St	None	Las Vegas	2023-07-28T20:05:14.859000+00:00	None	12121	Jack

答案1

得分: 1

这是您要翻译的内容:

尝试这个,更多关于 .json_normalize

df = pd.json_normalize(data, 
                       record_path='addresses', 
                       meta=['rec_id', ["user", "id"], ["user", "name"], 'timestamp', 'edited_timestamp'])

输出:

address_type street1 street2 city rec_id user.id user.name timestamp edited_timestamp
0 Home 100 Main St Chicago 1 12414 Steve 2023-07-28T20:05:14.859000+00:00
1 Work 100 Main St Chicago 1 12414 Steve 2023-07-28T20:05:14.859000+00:00
2 Home 100 Main St Boston 2 214521 Tim 2023-07-28T20:05:14.859000+00:00
3 Work 100 Main St Boston 2 214521 Tim 2023-07-28T20:05:14.859000+00:00
4 Home 100 Main St Las Vegas 3 12121 Jack 2023-07-28T20:05:14.859000+00:00
英文:

Try this, more about .json_normalize

df = pd.json_normalize(data, 
record_path='addresses', 
meta=['rec_id', ["user", "id"], ["user", "name"], 'timestamp', 'edited_timestamp'])

Output:

address_type street1 street2 city rec_id user.id user.name timestamp edited_timestamp
0 Home 100 Main St Chicago 1 12414 Steve 2023-07-28T20:05:14.859000+00:00
1 Work 100 Main St Chicago 1 12414 Steve 2023-07-28T20:05:14.859000+00:00
2 Home 100 Main St Boston 2 214521 Tim 2023-07-28T20:05:14.859000+00:00
3 Work 100 Main St Boston 2 214521 Tim 2023-07-28T20:05:14.859000+00:00
4 Home 100 Main St Las Vegas 3 12121 Jack 2023-07-28T20:05:14.859000+00:00

答案2

得分: 0

尝试:

data_df = pd.json_normalize(
    data,
    meta=["rec_id", "timestamp", "edited_timestamp", ["user", "id"], ["user", "name"]],
    record_path=["addresses"],
    record_prefix="addresses."
)
print(data_df)

打印:

  addresses.address_type addresses.street1 addresses.street2 addresses.city rec_id                         timestamp edited_timestamp user.id user.name
0                   Home       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
1                   Work       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
2                   Home       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
3                   Work       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
4                   Home       100 Main St              None      Las Vegas      3  2023-07-28T20:05:14.859000+00:00             None   12121      Jack
英文:

Try:

data_df = pd.json_normalize(
    data,
    meta=["rec_id", "timestamp", "edited_timestamp", ["user", "id"], ["user", "name"]],
    record_path=["addresses"],
    record_prefix="addresses.",
)
print(data_df)

Prints:

  addresses.address_type addresses.street1 addresses.street2 addresses.city rec_id                         timestamp edited_timestamp user.id user.name
0                   Home       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
1                   Work       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
2                   Home       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
3                   Work       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
4                   Home       100 Main St              None      Las Vegas      3  2023-07-28T20:05:14.859000+00:00             None   12121      Jack

huangapple
  • 本文由 发表于 2023年8月5日 04:37:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838992.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定