2023年6月8日 18:24:00go评论69阅读模式

英文:

How to print all duplicates key, including full paths, and optionally with values, for nested JSON in Python?

问题

外部库允许但不太推荐。

示例输入：

data.json 内容如下：

{
    "name": "John",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "street": "321 Wall St"
    },
    "contacts": [
        {
            "type": "email",
            "value": "john@example.com"
        },
        {
            "type": "phone",
            "value": "555-1234"
        },
        {
            "type": "email",
            "value": "johndoe@example.com"
        }
    ],
    "age": 35
}

示例预期输出：

找到重复键：
  age (30, 35)
  address -> street ("123 Main St", "321 Wall St")

使用 json.load/s 返回标准的 Python 字典将会删除重复项，所以我认为我们需要一种在加载 JSON 时以某种深度优先搜索/访问者模式方式进行“流式”处理的方法。

我还尝试了类似于这里建议的方法：https://stackoverflow.com/a/14902564/8878330（如下引用）

def dict_raise_on_duplicates(ordered_pairs):
    """拒绝重复的键。"""
    d = {}
    for k, v in ordered_pairs:
        if k in d:
           raise ValueError("重复的键：%r" % (k,))
        else:
           d[k] = v
    return d

我唯一的更改是，不是引发异常，而是将重复的键追加到列表中，这样我可以在最后打印重复的键列表。

问题是，我不知道如何简单地获取重复键的“完整路径”。

英文:

External libraries are allowed but less preferred.

Example input:

data.json with content:

{
    &quot;name&quot;: &quot;John&quot;,
    &quot;age&quot;: 30,
    &quot;address&quot;: {
        &quot;street&quot;: &quot;123 Main St&quot;,
        &quot;city&quot;: &quot;New York&quot;,
	&quot;street&quot;: &quot;321 Wall St&quot;
    },
    &quot;contacts&quot;: [
        {
            &quot;type&quot;: &quot;email&quot;,
            &quot;value&quot;: &quot;john@example.com&quot;
        },
        {
            &quot;type&quot;: &quot;phone&quot;,
            &quot;value&quot;: &quot;555-1234&quot;
        },
        {
            &quot;type&quot;: &quot;email&quot;,
            &quot;value&quot;: &quot;johndoe@example.com&quot;
        }
    ],
    &quot;age&quot;: 35
}

Example expected output:

Duplicate keys found:
  age (30, 35)
  address -&gt; street (&quot;123 Main St&quot;, &quot;321 Wall St&quot;)

Using json.load/s as is returning a standard Python dictionary will remove duplicates so I think we need a way to "stream" the json as it's loading in some depth first search / visitor pattern way.

I've also tried something similar to what was suggested here: https://stackoverflow.com/a/14902564/8878330 (quoted below)

def dict_raise_on_duplicates(ordered_pairs):
    &quot;&quot;&quot;Reject duplicate keys.&quot;&quot;&quot;
    d = {}
    for k, v in ordered_pairs:
        if k in d:
           raise ValueError(&quot;duplicate key: %r&quot; % (k,))
        else:
           d[k] = v
    return d

The only change I made was instead of raising, I appended the duplicate key to a list so I can print the list of duplicate keys at the end.

The problem is I don't see a simple way to get the "full path" of the duplicate keys

答案1

得分: 0

我们使用 json.loads 方法的 object_pairs_hook 参数来检查同一字典中的所有键/值对并检查重复键。当发现重复键时，我们修改键名，将 #duplicate_key# 添加到它前面（我们假设没有原始键名以这些字符开头）。接下来，我们递归地遍历刚刚从 JSON 解析出来的对象，计算字典键的完整路径，并打印出我们发现的重复项的路径和值。

import json

DUPLICATE_MARKER = '#duplicate_key#'
DUPLICATE_MARKER_LENGTH = len(DUPLICATE_MARKER)

s = """{
    "name": "John",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "street": "321 Wall St"
    },
    "contacts": [
        {
            "type": "email",
            "value": "john@example.com"
        },
        {
            "type": "phone",
            "value": "555-1234"
        },
        {
            "type": "email",
            "value": "johndoe@example.com"
        }
    ],
    "age": 35
}"""

def my_hook(initial_pairs):
    s = set()
    pairs = []
    for pair in initial_pairs:
        k, v = pair
        if k in s:
            # 替换键名：
            k = DUPLICATE_MARKER + k
            pairs.append((k, v))
        else:
            s.add(k)
            pairs.append(pair)
    return dict(pairs)

def get_duplicates_path(o, path):
    if isinstance(o, list):
        for i, v in enumerate(o):
            get_duplicates_path(v, f'{path}[{i}]')
    elif isinstance(o, dict):
        for k, v in o.items():
            if k[:DUPLICATE_MARKER_LENGTH] == DUPLICATE_MARKER:
                print(f'duplicate key at {path}[{repr(k[DUPLICATE_MARKER_LENGTH:])}] with value {repr(v)}')
            else:
                get_duplicates_path(v, f'{path}[{repr(k)}]')

print(s)
obj = json.loads(s, object_pairs_hook=my_hook)
get_duplicates_path(obj, 'obj')

print()

# 另一个测试：

s = """[
   {
       "x": [{"a": 1, "b": 2, "c": 3}, {"a": 1, "b": 2, "a": 3}]
   },
   {
       "y": "z"
   }
]"""

print(s)
obj = json.loads(s, object_pairs_hook=my_hook)
get_duplicates_path(obj, 'obj')

打印结果：

{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"street": "321 Wall St"
},
"contacts": [
{
"type": "email",
"value": "john@example.com"
},
{
"type": "phone",
"value": "555-1234"
},
{
"type": "email",
"value": "johndoe@example.com"
}
],
"age": 35
}
duplicate key at obj['address']['street'] with value '321 Wall St'
duplicate key at obj['age'] with value 35
[
{
"x": [{"a": 1, "b": 2, "c": 3}, {"a": 1, "b": 2, "a": 3}]
},
{
"y": "z"
}
]
duplicate key at obj[0]['x'][1]['a'] with value 3

英文:

We use the object_pairs_hook argument of the json.loads method to inspect all key/value pairs within the same dictionary and check for duplicate keys. When a duplicate key is found, we modify the key name by prepending `#duplicate_key#' to it (we assume that no original key name begins with those characters). Next we recursively walk the resultant object that was just parsed from the JSON to compute the full paths of dictionary keys and print out the paths and values for the duplicates we discovered.

import json

DUPLICATE_MARKER = &#39;#duplicate_key#&#39;
DUPLICATE_MARKER_LENGTH = len(DUPLICATE_MARKER)

s = &quot;&quot;&quot;{
    &quot;name&quot;: &quot;John&quot;,
    &quot;age&quot;: 30,
    &quot;address&quot;: {
        &quot;street&quot;: &quot;123 Main St&quot;,
        &quot;city&quot;: &quot;New York&quot;,
        &quot;street&quot;: &quot;321 Wall St&quot;
    },
    &quot;contacts&quot;: [
        {
            &quot;type&quot;: &quot;email&quot;,
            &quot;value&quot;: &quot;john@example.com&quot;
        },
        {
            &quot;type&quot;: &quot;phone&quot;,
            &quot;value&quot;: &quot;555-1234&quot;
        },
        {
            &quot;type&quot;: &quot;email&quot;,
            &quot;value&quot;: &quot;johndoe@example.com&quot;
        }
    ],
    &quot;age&quot;: 35
}&quot;&quot;&quot;

def my_hook(initial_pairs):
    s = set()
    pairs = []
    for pair in initial_pairs:
        k, v = pair
        if k in s:
            # Replace key name:
            k = DUPLICATE_MARKER + k
            pairs.append((k, v))
        else:
            s.add(k)
            pairs.append(pair)
    return dict(pairs)

def get_duplicates_path(o, path):
    if isinstance(o, list):
        for i, v in enumerate(o):
            get_duplicates_path(v, f&#39;{path}[{i}]&#39;)
    elif isinstance(o, dict):
        for k, v in o.items():
            if k[:DUPLICATE_MARKER_LENGTH] == DUPLICATE_MARKER:
                print(f&#39;duplicate key at {path}[{repr(k[DUPLICATE_MARKER_LENGTH:])}] with value {repr(v)}&#39;)
            else:
                get_duplicates_path(v, f&#39;{path}[{repr(k)}]&#39;)

print(s)
obj = json.loads(s, object_pairs_hook=my_hook)
get_duplicates_path(obj, &#39;obj&#39;)

print()

# Another test:

s = &quot;&quot;&quot;[
   {
       &quot;x&quot;: [{&quot;a&quot;: 1, &quot;b&quot;: 2, &quot;c&quot;: 3}, {&quot;a&quot;: 1, &quot;b&quot;: 2, &quot;a&quot;: 3}]
   },
   {
       &quot;y&quot;: &quot;z&quot;
   }
]&quot;&quot;&quot;

print(s)
obj = json.loads(s, object_pairs_hook=my_hook)
get_duplicates_path(obj, &#39;obj&#39;)

Prints:

{
&quot;name&quot;: &quot;John&quot;,
&quot;age&quot;: 30,
&quot;address&quot;: {
&quot;street&quot;: &quot;123 Main St&quot;,
&quot;city&quot;: &quot;New York&quot;,
&quot;street&quot;: &quot;321 Wall St&quot;
},
&quot;contacts&quot;: [
{
&quot;type&quot;: &quot;email&quot;,
&quot;value&quot;: &quot;john@example.com&quot;
},
{
&quot;type&quot;: &quot;phone&quot;,
&quot;value&quot;: &quot;555-1234&quot;
},
{
&quot;type&quot;: &quot;email&quot;,
&quot;value&quot;: &quot;johndoe@example.com&quot;
}
],
&quot;age&quot;: 35
}
duplicate key at obj[&#39;address&#39;][&#39;street&#39;] with value &#39;321 Wall St&#39;
duplicate key at obj[&#39;age&#39;] with value 35
[
{
&quot;x&quot;: [{&quot;a&quot;: 1, &quot;b&quot;: 2, &quot;c&quot;: 3}, {&quot;a&quot;: 1, &quot;b&quot;: 2, &quot;a&quot;: 3}]
},
{
&quot;y&quot;: &quot;z&quot;
}
]
duplicate key at obj[0][&#39;x&#39;][1][&#39;a&#39;] with value 3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to print all duplicates key, including full paths, and optionally with values, for nested JSON in Python?

问题

答案1

在Chi路由器中，使用`w.Write`和`Render.JSON`发送响应有什么区别？

如何在一组中选择最早日期上的重复项？

Python: Table where identical ID/Numbers with different values to being them on one line where the different values are appended to the right

我要翻译的内容：如何从我的JSON响应中提取地图？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论