创建分类之间的层次结构。

huangapple go评论70阅读模式
英文:

Create a hierarchy between categories

问题

以下是您提供的代码的翻译部分:

category_counts = {}
for index, row in pandas_test.iterrows():
    categories = row['TAGS']
    for i in range(len(categories)):
        category = categories[i].strip()
        if category not in category_counts:
            category_counts[category] = {'count': 1, 'subcategories': set()}
        else:
            category_counts[category]['count'] += 1
        for j in range(i + 1, len(categories)):
            subcategory = categories[j].strip()
            category_counts[category]['subcategories'].add(subcategory)

# 分析category_counts字典以确定层次结构
hierarchy = {}
for category, data in category_counts.items():
    subcategories = data['subcategories']
    for subcategory in subcategories:
        if subcategory in category_counts:
            if category not in category_counts[subcategory]['subcategories']:
                hierarchy[subcategory] = category

# 将层次结构应用于类别
for category, parent in hierarchy.items():
    if parent in hierarchy:
        hierarchy[category] = hierarchy[parent]
print(hierarchy)

请注意,这段代码用于查找每个类别的父类。如果您有任何问题或需要进一步的帮助,请告诉我。

英文:

I have this following dataframe :

pandas_test=pd.DataFrame(data={'TAGS': [['Category1','Category2','Category3'],
                                                           ['Category2','Category4'],
                                                            ['Category5','Category4'],
                                                               ['Category5','Category4','Category6','Category8'],
                                                               ['Category1','Category2'],
                                                               ['Category2','Category3']]})

I try to find the parent of each category. To explain how it should work : a categoryA on the left and of the same row of another categoryB would be his parent. So in the case of pandas_test, I would like this result :

 {'Category1': None, ‘Category2': 'Category1', 'Category3': 'Category2’, 'Category4': 'Category2', 'Category5': None, 'Category4’: 'Category5’, 'Category6’: 'Category4’, 'Category8’: 'Category6’}.

Here, Category1 doesn't have a parent, Category2 has Category1 as a parent, Category3 has Category2, etc...

For the moment, I have the following code :

category_counts = {}
for index, row in pandas_test.iterrows():
    #categories = row['TAGS'][0].split(',') if row['TAGS'] else []
    categories = row['TAGS']
    for i in range(len(categories)):
        category = categories[i].strip()
        if category not in category_counts:
            category_counts[category] = {'count': 1, 'subcategories': set()}
        else:
            category_counts[category]['count'] += 1
        for j in range(i + 1, len(categories)):
            subcategory = categories[j].strip()
            category_counts[category]['subcategories'].add(subcategory)

# Analyze category_counts dictionary to determine hierarchy
hierarchy = {}
for category, data in category_counts.items():
    subcategories = data['subcategories']
    for subcategory in subcategories:
        if subcategory in category_counts:
            if category not in category_counts[subcategory]['subcategories']:
                hierarchy[subcategory] = category

# Apply hierarchy to categories
for category, parent in hierarchy.items():
    if parent in hierarchy:
        hierarchy[category] = hierarchy[parent]
print(hierarchy)

But this code returns me this following result :

{'Category3': 'Category1', 'Category2': 'Category1', 'Category4': 'Category5', 'Category6': 'Category5', 'Category8': 'Category5'}

Category3 should have Category2 as a parent. Of course Category1 is a parent of Category3 aswell because Category1 is a parent of Category2 and Category2 is the parent of Category3, but I want the closest parent (so same for Category 6 and 8 having Category5 as a parent). Also, I want Category4 being the son of Category2 AND Category5.

Can someone helps me please?

答案1

得分: 2

这是一个图问题,使用专门的库如 networkx 来构建有向图,并获取每个节点的 predecessors

# pip install networkx
import networkx as nx
from itertools import pairwise

G = nx.from_edgelist([edge for l in pandas_test['TAGS']
                      for edge in pairwise(l)],
                     create_using=nx.DiGraph)

out = {n: list(G.predecessors(n)) for n in G}

print(out)

注意:在 Python 版本低于 3.10 上,可以用 zip(l, l[1:]) 替换 pairwise(l)

输出:

{'Category1': [],
 'Category2': ['Category1'],
 'Category3': ['Category2'],
 'Category4': ['Category2', 'Category5'],
 'Category5': [],
 'Category6': ['Category4'],
 'Category8': ['Category6']}

作为 DataFrame:

df_out = (pd.Series(out).explode()
            .rename_axis('Child').reset_index(name='Parent')
          )

输出:

       Child     Parent
0  Category1        NaN
1  Category2  Category1
2  Category3  Category2
3  Category4  Category2
4  Category4  Category5
5  Category5        NaN
6  Category6  Category4
7  Category8  Category6

图:

创建分类之间的层次结构。

英文:

This is a graph problem, use a specialized library like networkx to build a directed graph and get the predecessors of each node:

# pip install networkx
import networkx as nx
from itertools import pairwise

G = nx.from_edgelist([edge for l in pandas_test['TAGS']
                      for edge in pairwise(l)],
                     create_using=nx.DiGraph)

out = {n: list(G.predecessors(n)) for n in G}

print(out)

NB. on python <3.10 you can replace pairwise(l) by zip(l, l[1:]).

Output:

{&#39;Category1&#39;: [],
 &#39;Category2&#39;: [&#39;Category1&#39;],
 &#39;Category3&#39;: [&#39;Category2&#39;],
 &#39;Category4&#39;: [&#39;Category2&#39;, &#39;Category5&#39;],
 &#39;Category5&#39;: [],
 &#39;Category6&#39;: [&#39;Category4&#39;],
 &#39;Category8&#39;: [&#39;Category6&#39;]}

As a DataFrame:

df_out = (pd.Series(out).explode()
            .rename_axis(&#39;Child&#39;).reset_index(name=&#39;Parent&#39;)
          )

Output:

       Child     Parent
0  Category1        NaN
1  Category2  Category1
2  Category3  Category2
3  Category4  Category2
4  Category4  Category5
5  Category5        NaN
6  Category6  Category4
7  Category8  Category6

The graph:

创建分类之间的层次结构。

答案2

得分: 0

请尝试以下方法来实现您的输出。

result_dict = {}

for tags in pandas_test['TAGS']:
    for i, category in enumerate(tags):
        if i > 0:
            result_dict[category] = tags[i - 1]
        else:
            result_dict[category] = None

for tags in pandas_test['TAGS']:
    for i, category in enumerate(tags):
        if i > 0 and tags[i - 1] not in result_dict:
            result_dict[tags[i - 1]] = None
print(result_dict)
英文:

Please try the following method to achieve your output.

result_dict = {} 

for tags in pandas_test[&#39;TAGS&#39;]:
    for i, category in enumerate(tags):
        if i &gt; 0:
            result_dict[category] = tags[i - 1]
        else:
            result_dict[category] = None
            

for tags in pandas_test[&#39;TAGS&#39;]:
    for i, category in enumerate(tags):
        if i &gt; 0 and tags[i - 1] not in result_dict:
            result_dict[tags[i - 1]] = None
print(result_dict)

huangapple
  • 本文由 发表于 2023年6月22日 16:51:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76530115.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定