英文:
How to print all duplicates key, including full paths, and optionally with values, for nested JSON in Python?
问题
外部库允许但不太推荐。
示例输入:
data.json
内容如下:
{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"street": "321 Wall St"
},
"contacts": [
{
"type": "email",
"value": "john@example.com"
},
{
"type": "phone",
"value": "555-1234"
},
{
"type": "email",
"value": "johndoe@example.com"
}
],
"age": 35
}
示例预期输出:
找到重复键:
age (30, 35)
address -> street ("123 Main St", "321 Wall St")
使用 json.load/s
返回标准的 Python 字典将会删除重复项,所以我认为我们需要一种在加载 JSON 时以某种深度优先搜索/访问者模式方式进行“流式”处理的方法。
我还尝试了类似于这里建议的方法:https://stackoverflow.com/a/14902564/8878330(如下引用)
def dict_raise_on_duplicates(ordered_pairs):
"""拒绝重复的键。"""
d = {}
for k, v in ordered_pairs:
if k in d:
raise ValueError("重复的键:%r" % (k,))
else:
d[k] = v
return d
我唯一的更改是,不是引发异常,而是将重复的键追加到列表中,这样我可以在最后打印重复的键列表。
问题是,我不知道如何简单地获取重复键的“完整路径”。
英文:
External libraries are allowed but less preferred.
Example input:
data.json
with content:
{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"street": "321 Wall St"
},
"contacts": [
{
"type": "email",
"value": "john@example.com"
},
{
"type": "phone",
"value": "555-1234"
},
{
"type": "email",
"value": "johndoe@example.com"
}
],
"age": 35
}
Example expected output:
Duplicate keys found:
age (30, 35)
address -> street ("123 Main St", "321 Wall St")
Using json.load/s as is returning a standard Python dictionary will remove duplicates so I think we need a way to "stream" the json as it's loading in some depth first search / visitor pattern way.
I've also tried something similar to what was suggested here: https://stackoverflow.com/a/14902564/8878330 (quoted below)
def dict_raise_on_duplicates(ordered_pairs):
"""Reject duplicate keys."""
d = {}
for k, v in ordered_pairs:
if k in d:
raise ValueError("duplicate key: %r" % (k,))
else:
d[k] = v
return d
The only change I made was instead of raising, I appended the duplicate key to a list so I can print the list of duplicate keys at the end.
The problem is I don't see a simple way to get the "full path" of the duplicate keys
答案1
得分: 0
我们使用 json.loads
方法的 object_pairs_hook 参数来检查同一字典中的所有键/值对并检查重复键。当发现重复键时,我们修改键名,将 #duplicate_key#
添加到它前面(我们假设没有原始键名以这些字符开头)。接下来,我们递归地遍历刚刚从 JSON 解析出来的对象,计算字典键的完整路径,并打印出我们发现的重复项的路径和值。
import json
DUPLICATE_MARKER = '#duplicate_key#'
DUPLICATE_MARKER_LENGTH = len(DUPLICATE_MARKER)
s = """{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"street": "321 Wall St"
},
"contacts": [
{
"type": "email",
"value": "john@example.com"
},
{
"type": "phone",
"value": "555-1234"
},
{
"type": "email",
"value": "johndoe@example.com"
}
],
"age": 35
}"""
def my_hook(initial_pairs):
s = set()
pairs = []
for pair in initial_pairs:
k, v = pair
if k in s:
# 替换键名:
k = DUPLICATE_MARKER + k
pairs.append((k, v))
else:
s.add(k)
pairs.append(pair)
return dict(pairs)
def get_duplicates_path(o, path):
if isinstance(o, list):
for i, v in enumerate(o):
get_duplicates_path(v, f'{path}[{i}]')
elif isinstance(o, dict):
for k, v in o.items():
if k[:DUPLICATE_MARKER_LENGTH] == DUPLICATE_MARKER:
print(f'duplicate key at {path}[{repr(k[DUPLICATE_MARKER_LENGTH:])}] with value {repr(v)}')
else:
get_duplicates_path(v, f'{path}[{repr(k)}]')
print(s)
obj = json.loads(s, object_pairs_hook=my_hook)
get_duplicates_path(obj, 'obj')
print()
# 另一个测试:
s = """[
{
"x": [{"a": 1, "b": 2, "c": 3}, {"a": 1, "b": 2, "a": 3}]
},
{
"y": "z"
}
]"""
print(s)
obj = json.loads(s, object_pairs_hook=my_hook)
get_duplicates_path(obj, 'obj')
打印结果:
{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"street": "321 Wall St"
},
"contacts": [
{
"type": "email",
"value": "john@example.com"
},
{
"type": "phone",
"value": "555-1234"
},
{
"type": "email",
"value": "johndoe@example.com"
}
],
"age": 35
}
duplicate key at obj['address']['street'] with value '321 Wall St'
duplicate key at obj['age'] with value 35
[
{
"x": [{"a": 1, "b": 2, "c": 3}, {"a": 1, "b": 2, "a": 3}]
},
{
"y": "z"
}
]
duplicate key at obj[0]['x'][1]['a'] with value 3
英文:
We use the object_pairs_hook argument of the json.loads
method to inspect all key/value pairs within the same dictionary and check for duplicate keys. When a duplicate key is found, we modify the key name by prepending `#duplicate_key#' to it (we assume that no original key name begins with those characters). Next we recursively walk the resultant object that was just parsed from the JSON to compute the full paths of dictionary keys and print out the paths and values for the duplicates we discovered.
import json
DUPLICATE_MARKER = '#duplicate_key#'
DUPLICATE_MARKER_LENGTH = len(DUPLICATE_MARKER)
s = """{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"street": "321 Wall St"
},
"contacts": [
{
"type": "email",
"value": "john@example.com"
},
{
"type": "phone",
"value": "555-1234"
},
{
"type": "email",
"value": "johndoe@example.com"
}
],
"age": 35
}"""
def my_hook(initial_pairs):
s = set()
pairs = []
for pair in initial_pairs:
k, v = pair
if k in s:
# Replace key name:
k = DUPLICATE_MARKER + k
pairs.append((k, v))
else:
s.add(k)
pairs.append(pair)
return dict(pairs)
def get_duplicates_path(o, path):
if isinstance(o, list):
for i, v in enumerate(o):
get_duplicates_path(v, f'{path}[{i}]')
elif isinstance(o, dict):
for k, v in o.items():
if k[:DUPLICATE_MARKER_LENGTH] == DUPLICATE_MARKER:
print(f'duplicate key at {path}[{repr(k[DUPLICATE_MARKER_LENGTH:])}] with value {repr(v)}')
else:
get_duplicates_path(v, f'{path}[{repr(k)}]')
print(s)
obj = json.loads(s, object_pairs_hook=my_hook)
get_duplicates_path(obj, 'obj')
print()
# Another test:
s = """[
{
"x": [{"a": 1, "b": 2, "c": 3}, {"a": 1, "b": 2, "a": 3}]
},
{
"y": "z"
}
]"""
print(s)
obj = json.loads(s, object_pairs_hook=my_hook)
get_duplicates_path(obj, 'obj')
Prints:
{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"street": "321 Wall St"
},
"contacts": [
{
"type": "email",
"value": "john@example.com"
},
{
"type": "phone",
"value": "555-1234"
},
{
"type": "email",
"value": "johndoe@example.com"
}
],
"age": 35
}
duplicate key at obj['address']['street'] with value '321 Wall St'
duplicate key at obj['age'] with value 35
[
{
"x": [{"a": 1, "b": 2, "c": 3}, {"a": 1, "b": 2, "a": 3}]
},
{
"y": "z"
}
]
duplicate key at obj[0]['x'][1]['a'] with value 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论