2023年4月11日 03:21:03go评论70阅读模式

英文:

Trying to remove double based on condition from list of dictionaries

问题

以下是翻译好的部分：

我有这个字典列表：

list_dict = [
    {'title':'abc defg hij', 'situation':'other'},
    {'title':'c defg', 'situation':'other'},
    {'title':'defg hij', 'situation':'other'},
    {'title':'defg hij', 'situation':'deleted'}]

我试图移除具有标题中重复元素和相同情况的每个字典，仅保留标题键中最长字符串的字典。

期望的输出如下：

[{'title':'abc defg hij', 'situation':'other'},
 {'title':'defg hij', 'situation':'deleted'}]

英文:

I have this list of dictionaries:

list_dict = [
    {&#39;title&#39;:&#39;abc defg hij&#39;, &#39;situation&#39;:&#39;other&#39;},
    {&#39;title&#39;:&#39;c defg&#39;, &#39;situation&#39;:&#39;other&#39;},
    {&#39;title&#39;:&#39;defg hij&#39;, &#39;situation&#39;:&#39;other&#39;},
    {&#39;title&#39;:&#39;defg hij&#39;, &#39;situation&#39;:&#39;deleted&#39;}]

I'm trying to remove every dictionnary that has some reccuring elements in the title AND the same situation, keeping only the one with the longest string in the title key.

The desired output would be as follows:

[{&#39;title&#39;:&#39;abc defg hij&#39;, &#39;situation&#39;:&#39;other&#39;},
 {&#39;title&#39;:&#39;defg hij&#39;, &#39;situation&#39;:&#39;deleted&#39;}]

答案1

得分: 1

I'm assuming that by "has some recurring elements in the title", you mean "is a substring of any other title" (within a given situation).

I'm assuming also that you're dealing with relatively small datasets so you won't be concerned with a quadratic algorithm for eliminating redundant strings. Nothing fancy – just construct a set of compatible strings adding one string at a time, checking for substrings:

def find_distinct_strs(all_strs):
    distinct_strs = set()

    for new_str in all_strs:
        for existing_str in distinct_strs:
            if new_str in existing_str:
                # new_str is redundant, go to next
                break
            elif existing_str in new_str:
                # new_str supersedes existing_str
                distinct.remove(existing_str)
        else:
            distinct_strs.add(new_str)
            continue

        break

    return list(distinct_strs)

You can then group all the entries by situation, find the distinct titles, and construct a suitably thinned list:

from collections import groupby
def filter_list_dict(list_dict):
    return [
        dict(title=title, situation=situation)
            for situation, entries in groupby(list_dict, lambda entry: entry["situation"])
                for title in find_distinct_strs(entry["title"] for entry in entries)
    ]

Test the output:

> list_dict = [
    {'title':'abc defg hij', 'situation':'other'},
    {'title':'c defg', 'situation':'other'},
    {'title':'defg hij', 'situation':'other'},
    {'title':'defg hij', 'situation':'deleted'}
]
> print(filter_list_dict(list_dict))
[{'title': 'abc defg hij', 'situation': 'other'},
   {'title': 'defg hij', 'situation': 'deleted'}]

英文:

I'm assuming that by "has some recurring elements in the title", you mean "is a substring of any other title" (within a given situation).

def find_distinct_strs(all_strs):
    distinct_strs = set()

    for new_str in all_strs:
        for existing_str in distinct_strs:
            if new_str in existing_str:
                # new_str is redundant, go to next
                break
            elif existing_str in new_str:
                #&#160;new_str supersedes existing_str
                distinct.remove(existing_str)
        else:
            distinct_strs.add(new_str)
            continue

        break

    return list(distinct_strs)

You can then group all the entries by situation, find the distinct titles, and construct a suitably thinned list:

from collections import groupby
def filter_list_dict(list_dict):
    return [
        dict(title=title, situation=situation)
            for situation, entries in groupby(list_dict, lambda entry: entry[&quot;situation&quot;])
                for title in find_distinct_strs(entry[&quot;title&quot;] for entry in entries)
    ]

Test the output:

&gt; list_dict = [
    {&#39;title&#39;:&#39;abc defg hij&#39;, &#39;situation&#39;:&#39;other&#39;},
    {&#39;title&#39;:&#39;c defg&#39;, &#39;situation&#39;:&#39;other&#39;},
    {&#39;title&#39;:&#39;defg hij&#39;, &#39;situation&#39;:&#39;other&#39;},
    {&#39;title&#39;:&#39;defg hij&#39;, &#39;situation&#39;:&#39;deleted&#39;}
]
&gt; print(filter_list_dict(list_dict))
[{&#39;title&#39;: &#39;abc defg hij&#39;, &#39;situation&#39;: &#39;other&#39;},
   {&#39;title&#39;: &#39;defg hij&#39;, &#39;situation&#39;: &#39;deleted&#39;}]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

尝试根据条件从字典列表中移除重复项。

问题

答案1

怎样为ttk.Combobox的不同选项添加不同的背景颜色？

Python Shiny：如何使用两个按钮切换条件面板的可见性？

在我开发的多个包/项目中，我想放置一个小型实用函数的位置在哪里？

有没有设置默认绘图样式的方法？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论