Python: 从 JSON 中删除具有相同键值的重复项

huangapple go评论69阅读模式
英文:

Python: remove duplicate in json from 2 key value

问题

我有一个像下面这样组织的JSON文件,我想删除两个键对元素中的所有重复项。

[{'name': 'anna', 'city': 'paris','code': '5'},  
{'name': 'anna', 'city': 'paris','code': '2'},
{'name': 'henry', 'city': 'london','code': '1'},
{'name': 'henry', 'city': 'london','code': '3'},...]

期望的输出是:

[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]

我在这个任务中遇到困难,有什么想法?

英文:

I have a json file organised like the following one and I would like to delete all duplicated from 2 key pairs element

[{'name': 'anna', 'city': 'paris','code': '5'},  
{'name': 'anna', 'city': 'paris','code': '2'},
{'name': 'henry', 'city': 'london','code': '1'},
{'name': 'henry', 'city': 'london','code': '3'},...] 

expected outpout

[{'name': 'anna', 'city': 'paris'},{'name': 'henry', 'city': 'london'}]

I am struggling with this task, any ideas?

答案1

得分: 0

你需要为(姓名,城市)创建一个唯一的键,对于具有相同配对的记录,只需应用在最终结果中保留什么条件。

完成后,获取这些值,这就是答案。

使用 walrus 运算符和 dict-comprehension

l = [{'name': 'anna', 'city': 'paris', 'code': '5'}, {'name': 'anna', 'city': 'paris', 'code': '2'}, {'name': 'henry', 'city': 'london', 'code': '1'}, {'name': 'henry', 'city': 'london', 'code': '3'}]
result = { (name:= subdict['name'], city:= subdict['city']): dict(name=name, city=city) for subdict in l}
result
{('anna', 'paris'): {'name': 'anna', 'city': 'paris'}, ('henry', 'london'): {'name': 'henry', 'city': 'london'}}
solution = list(result.values())
solution
[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]
英文:

you need to make a unique key for (name, city) and for record whose have same pair just need to apply the condition of what to keep in the final result.

once done, get the values and that is the answer.

with walrus operator and dict-comprehension

>>> l = [{'name': 'anna', 'city': 'paris', 'code': '5'}, {'name': 'anna', 'city': 'paris', 'code': '2'}, {'name': 'henry', 'city': 'london', 'code': '1'}, {'name': 'henry', 'city': 'london', 'code': '3'}]
>>> result = { (name:= subdict['name'], city:= subdict['city']): dict(name=name, city=city) for subdict in l}
>>> result
{('anna', 'paris'): {'name': 'anna', 'city': 'paris'}, ('henry', 'london'): {'name': 'henry', 'city': 'london'}}
>>> solution = list(result.values())
>>> solution
[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]

答案2

得分: 0

在纯Python中,您可以选择字典行中所需的内容,使用可哈希的元组集合(用{}表示),然后使用所选内容重新构建行。

items = [
    {'name': 'anna', 'city': 'paris','code': '5'},
    {'name': 'anna', 'city': 'paris','code': '2'},
    {'name': 'henry', 'city': 'london','code': '1'},
    {'name': 'henry', 'city': 'london','code': '3'}
]

unique = {(item["name"], item["city"]) for item in items}

unique = [{"name": item[0], "city": item[1]} for item in unique]
英文:

In pure python you can select what you need in dictionnary rows, use set collection of hashable row like tuple (with {}) and then rebuild your rows with what you selected

items = [
    {'name': 'anna', 'city': 'paris','code': '5'},
    {'name': 'anna', 'city': 'paris','code': '2'},
    {'name': 'henry', 'city': 'london','code': '1'},
    {'name': 'henry', 'city': 'london','code': '3'}
]

unique = {(item["name"], item["city"]) for item in items}

unique = [{"name": item[0], "city": item[1]} for item in unique]

答案3

得分: 0

这里是另一种方法(其中之一),使用集合,代码如下:

input_list = [
    {'name': 'anna', 'city': 'paris', 'code': '5'},
    {'name': 'anna', 'city': 'paris', 'code': '2'},
    {'name': 'henry', 'city': 'london', 'code': '1'},
    {'name': 'henry', 'city': 'london', 'code': '3'}
]

output_list = []
unique_names = set()

for d in input_list:
    if (name := d.get('name')) not in unique_names:
        output_list.append({k: v for k, v in d.items() if k != 'code'})
        unique_names.add(name)

print(output_list)

输出:

[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]

注意:

这种方法至少有一个好处。其他答案构建新的字典时会包含键' name '和' city ',并且隐含地忽略' code ',对于所示数据是可以的。但是,这种方法构建新的字典不包括' code '。这意味着字典结构(输入数据)可以更改,而不必修改功能代码 - 即,' code '键可以不存在,并且除了' name '和' city '之外的键/值对可以被引入。

英文:

Here's another approach (one of many) that utilises a set as follows:

input_list = [
    {'name': 'anna', 'city': 'paris', 'code': '5'},
    {'name': 'anna', 'city': 'paris', 'code': '2'},
    {'name': 'henry', 'city': 'london', 'code': '1'},
    {'name': 'henry', 'city': 'london', 'code': '3'}
]

output_list = []
unique_names = set()

for d in input_list:
    if (name := d.get('name')) not in unique_names:
        output_list.append({k: v for k, v in d.items() if k != 'code'})
        unique_names.add(name)

print(output_list)

Output:

[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]

Note:

There's at least one benefit of doing it this way. Other answers are building the new dictionaries to include keys 'name' and 'city' and implicitly ignore 'code' which is fine for the data as shown. However, this approach builds the new dictionaries excluding 'code'. What this means is that the dictionary structures (the input data) can change without having to alter the functional code - i.e., the 'code' key could be absent and key/value pairs in addition to 'name' and 'city' could be introduced

huangapple
  • 本文由 发表于 2023年3月4日 00:40:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75629706.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定