2023年6月22日 16:51:01go评论100阅读模式

英文:

Create a hierarchy between categories

问题

以下是您提供的代码的翻译部分：

category_counts = {}
for index, row in pandas_test.iterrows():
    categories = row['TAGS']
    for i in range(len(categories)):
        category = categories[i].strip()
        if category not in category_counts:
            category_counts[category] = {'count': 1, 'subcategories': set()}
        else:
            category_counts[category]['count'] += 1
        for j in range(i + 1, len(categories)):
            subcategory = categories[j].strip()
            category_counts[category]['subcategories'].add(subcategory)
# 分析category_counts字典以确定层次结构
hierarchy = {}
for category, data in category_counts.items():
    subcategories = data['subcategories']
    for subcategory in subcategories:
        if subcategory in category_counts:
            if category not in category_counts[subcategory]['subcategories']:
                hierarchy[subcategory] = category
# 将层次结构应用于类别
for category, parent in hierarchy.items():
    if parent in hierarchy:
        hierarchy[category] = hierarchy[parent]
print(hierarchy)

请注意，这段代码用于查找每个类别的父类。如果您有任何问题或需要进一步的帮助，请告诉我。

英文:

I have this following dataframe :

pandas_test=pd.DataFrame(data={&#39;TAGS&#39;: [[&#39;Category1&#39;,&#39;Category2&#39;,&#39;Category3&#39;],
                                                           [&#39;Category2&#39;,&#39;Category4&#39;],
                                                            [&#39;Category5&#39;,&#39;Category4&#39;],
                                                               [&#39;Category5&#39;,&#39;Category4&#39;,&#39;Category6&#39;,&#39;Category8&#39;],
                                                               [&#39;Category1&#39;,&#39;Category2&#39;],
                                                               [&#39;Category2&#39;,&#39;Category3&#39;]]})

I try to find the parent of each category. To explain how it should work : a categoryA on the left and of the same row of another categoryB would be his parent. So in the case of pandas_test, I would like this result :

 {&#39;Category1&#39;: None, ‘Category2&#39;: &#39;Category1&#39;, &#39;Category3&#39;: &#39;Category2’, &#39;Category4&#39;: &#39;Category2&#39;, &#39;Category5&#39;: None, &#39;Category4’: &#39;Category5’, &#39;Category6’: &#39;Category4’, &#39;Category8’: &#39;Category6’}.

Here, Category1 doesn't have a parent, Category2 has Category1 as a parent, Category3 has Category2, etc...

For the moment, I have the following code :

category_counts = {}
for index, row in pandas_test.iterrows():
    #categories = row[&#39;TAGS&#39;][0].split(&#39;,&#39;) if row[&#39;TAGS&#39;] else []
    categories = row[&#39;TAGS&#39;]
    for i in range(len(categories)):
        category = categories[i].strip()
        if category not in category_counts:
            category_counts[category] = {&#39;count&#39;: 1, &#39;subcategories&#39;: set()}
        else:
            category_counts[category][&#39;count&#39;] += 1
        for j in range(i + 1, len(categories)):
            subcategory = categories[j].strip()
            category_counts[category][&#39;subcategories&#39;].add(subcategory)
# Analyze category_counts dictionary to determine hierarchy
hierarchy = {}
for category, data in category_counts.items():
    subcategories = data[&#39;subcategories&#39;]
    for subcategory in subcategories:
        if subcategory in category_counts:
            if category not in category_counts[subcategory][&#39;subcategories&#39;]:
                hierarchy[subcategory] = category
# Apply hierarchy to categories
for category, parent in hierarchy.items():
    if parent in hierarchy:
        hierarchy[category] = hierarchy[parent]
print(hierarchy)

But this code returns me this following result :

{&#39;Category3&#39;: &#39;Category1&#39;, &#39;Category2&#39;: &#39;Category1&#39;, &#39;Category4&#39;: &#39;Category5&#39;, &#39;Category6&#39;: &#39;Category5&#39;, &#39;Category8&#39;: &#39;Category5&#39;}

Category3 should have Category2 as a parent. Of course Category1 is a parent of Category3 aswell because Category1 is a parent of Category2 and Category2 is the parent of Category3, but I want the closest parent (so same for Category 6 and 8 having Category5 as a parent). Also, I want Category4 being the son of Category2 AND Category5.

Can someone helps me please?

答案1

得分: 2

这是一个图问题，使用专门的库如 networkx 来构建有向图，并获取每个节点的 predecessors ：

# pip install networkx
import networkx as nx
from itertools import pairwise
G = nx.from_edgelist([edge for l in pandas_test['TAGS']
                      for edge in pairwise(l)],
                     create_using=nx.DiGraph)
out = {n: list(G.predecessors(n)) for n in G}
print(out)

注意：在 Python 版本低于 3.10 上，可以用 zip(l, l[1:]) 替换 pairwise(l)。

输出：

{'Category1': [],
 'Category2': ['Category1'],
 'Category3': ['Category2'],
 'Category4': ['Category2', 'Category5'],
 'Category5': [],
 'Category6': ['Category4'],
 'Category8': ['Category6']}

作为 DataFrame：

df_out = (pd.Series(out).explode()
            .rename_axis('Child').reset_index(name='Parent')
          )

输出：

       Child     Parent
0  Category1        NaN
1  Category2  Category1
2  Category3  Category2
3  Category4  Category2
4  Category4  Category5
5  Category5        NaN
6  Category6  Category4
7  Category8  Category6

图：

英文:

This is a graph problem, use a specialized library like networkx to build a directed graph and get the predecessors of each node:

# pip install networkx
import networkx as nx
from itertools import pairwise
G = nx.from_edgelist([edge for l in pandas_test[&#39;TAGS&#39;]
                      for edge in pairwise(l)],
                     create_using=nx.DiGraph)
out = {n: list(G.predecessors(n)) for n in G}
print(out)

NB. on python <3.10 you can replace pairwise(l) by zip(l, l[1:]).

Output:

{&#39;Category1&#39;: [],
 &#39;Category2&#39;: [&#39;Category1&#39;],
 &#39;Category3&#39;: [&#39;Category2&#39;],
 &#39;Category4&#39;: [&#39;Category2&#39;, &#39;Category5&#39;],
 &#39;Category5&#39;: [],
 &#39;Category6&#39;: [&#39;Category4&#39;],
 &#39;Category8&#39;: [&#39;Category6&#39;]}

As a DataFrame:

df_out = (pd.Series(out).explode()
            .rename_axis(&#39;Child&#39;).reset_index(name=&#39;Parent&#39;)
          )

Output:

       Child     Parent
0  Category1        NaN
1  Category2  Category1
2  Category3  Category2
3  Category4  Category2
4  Category4  Category5
5  Category5        NaN
6  Category6  Category4
7  Category8  Category6

The graph:

答案2

得分: 0

请尝试以下方法来实现您的输出。

result_dict = {}
for tags in pandas_test['TAGS']:
    for i, category in enumerate(tags):
        if i > 0:
            result_dict[category] = tags[i - 1]
        else:
            result_dict[category] = None
for tags in pandas_test['TAGS']:
    for i, category in enumerate(tags):
        if i > 0 and tags[i - 1] not in result_dict:
            result_dict[tags[i - 1]] = None
print(result_dict)

英文:

Please try the following method to achieve your output.

result_dict = {} 
for tags in pandas_test[&#39;TAGS&#39;]:
    for i, category in enumerate(tags):
        if i &gt; 0:
            result_dict[category] = tags[i - 1]
        else:
            result_dict[category] = None
            
for tags in pandas_test[&#39;TAGS&#39;]:
    for i, category in enumerate(tags):
        if i &gt; 0 and tags[i - 1] not in result_dict:
            result_dict[tags[i - 1]] = None
print(result_dict)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建分类之间的层次结构。

问题

答案1

答案2

优先考虑非线性系统中的方程。

parsing XML within HTML using python

为什么在tkinter中每次显示新图像时，此级别中的计时器会变得更快？

如何使用Keras TensorFlow进行预测？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。