Python – 从表格构建带有级别编号的树结构

huangapple go评论115阅读模式
英文:

Python - Construct tree structure from table with level numbering

问题

我有一个表格,已经被展开(请参见下面的表1),用来表示一个分层树结构。我需要创建等效的嵌套树结构(请参见下面的图1)。

是否有通用算法或概念可以实现这个?

我尝试过的方法

我正在使用Python 3.11并对anytree库进行子类化。然而,将其转换为嵌套的Python字典(在anytree期望格式中)也是一种解决方案。

我在下面的清单1中展示了我的代码,但它还很粗糙...我只知道我需要弄清楚lvl何时更改,然后使用递归将这些对象添加到它们的父级。

我尝试过递归处理它,但我不知道何时递归,要传递什么数据,以及我的返回条件是什么。请查看下面的清单1中的代码。

我的数据结构类似于邻接表,但是否足够相似以使用该代码?

支持数据/信息

表1

索引 lvl 零件号
1 0 101
2 1 101-1
3 2 101-1A
4 2 101-1B
5 2 101-1C
6 2 101-1D
7 1 101-2
8 2 101-2A
9 2 101-2B
10 2 101-2C

表1的数据排列如下:

  • index: 唯一的行标识符,提供整个表格的全局顺序。
  • lvl: 部件在层次结构中的级别。例如,0是根,1是根的子级,依此类推。
  • Part #: 部件的名称。

图1

Python – 从表格构建带有级别编号的树结构

清单1

"""
对象`TreeNode`是`anytree.AnyNode`的子类。该对象具有以下属性:
  - `index`:索引号(行号)
  - `lvl`:层次级别
  - `parent`:对象的父节点。

参数`tn_list`是TreeNode对象的列表
参数`parent_node`是树的根节点。它是一个TreeNode对象。
参数`current_level`跟踪我们正在处理的树中的当前“区域”。
[??] 我不确定这是否将是父级的级别,当前节点的级别等。
参数`children`是TreeNode对象的列表,它们是`parent_node`的子级。
[??] 我认为我不需要这个。
"""

def link_TreeNodes(tn_list=None, parent_node=None, children=None, 
                   current_level=None):
    temp_list = []
    children_nodes = []

    # 最后一个节点的级别以确定何时需要停止并将所有子节点添加到父节点。
    last_lvl = None

    for node in temp_list:
        lvl = getattr(node, 'lvl')
        if last_lvl is None:
            last_lvl = lvl
        else:
            last_lvl = lvl - 1

        if lvl == current_level:
            node.parent = parent_node
        elif lvl == current_level + 1:
            children_nodes.append(node)
        elif lvl == current_level - 1:
            # [??] 我对这里应该发生什么感到困惑...
            print(f'current_level: {current_level}; lvl: {lvl}')
        
        # 确定我们是否需要停止
        if last_lvl == lvl:
            stop = False
            continue
        elif last_lvl == lvl + 1:
            print('停止处理子节点...')
            stop = True
        
        # 我们检测到表格发生变化
        if stop:
            if children_nodes:
                parent_node = link_TreeNodes(
                    tn_list=children_nodes, parent_node=parent_node, 
                    current_level=parent_node.lvl+1)
            else:
                return parent_node
            stop = False
    
    # [??] 我的终止条件是什么?我从递归中返回什么?
    return parent_node

编辑1A:测试数据作为Python字典:

test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}

编辑1B:将输出数据转换为嵌套的Python字典:

correct_output = {
    '101': {'parent': None, 'children': ['101-1', '101-2']},
    '101-1': {'parent': '101', 'children': ['101-1A', '101-1B', '101-1C', '101-1D']},
    '101-1A': {'parent': '101-1', 'children': None},
    '101-1B': {'parent': '101-1', 'children': None},
    '101-1C': {'parent': '101-1', 'children': None},
    '101-1D': {'parent': '101-1', 'children': None},
    '101-2': {'parent': '101', 'children': ['101-2A', '101-2B

<details>
<summary>英文:</summary>

I have a table which has been flattened (see **Table 1**, below) to represent a hierarchical tree. I need to create the equivalent nested tree (see **Figure 1**, below). 

Is there a general algorithm or concept for doing this? 

### What I&#39;ve Tried

I&#39;m working in Python 3.11 and subclassing the `anytree` library. However, converting to a nested Python dictionary (in `anytree`&#39;s [expected format](https://anytree.readthedocs.io/en/latest/importer/dictimporter.html)) would also be a solution. 

I have my code shown below in **Listing 1**, but its very rough... all I know is I need to figure out when the `lvl` changes and then add those objects to their parent with recursion.

I&#39;ve tried processing it recursively but I&#39;m lost on when to recurse, what data to pass, and what my return condition is. See code below in **Listing 1**.

My data is structured similar to an [Adjacency List](https://stackoverflow.com/questions/21342596/tree-structure-from-adjacency-list), but is it similar enough to use code for that?


### Supporting Data/Info

#### Table 1
| index | lvl | Part # |
| ----- | --- | ------ |
| 1 | 0 | 101 |
| 2 | 1 | 101-1 |
| 3 | 2 | 101-1A |
| 4 | 2 | 101-1B |
| 5 | 2 | 101-1C |
| 6 | 2 | 101-1D |
| 7 | 1 | 101-2 |
| 8 | 2 | 101-2A |
| 9 | 2 | 101-2B |
| 10 | 2 | 101-2C |

**Table 1** data is arranged as follows:
* **_index_**: A unique row identifier which provides the global order of the entire table.
* **_lvl_**: the part&#39;s level in the hierachy. E.g. 0 is the root, 1 is the child of the root, etc.
* **_Part #_**: The name of the part.

#### Figure 1
[![enter image description here](https://i.stack.imgur.com/uAhs8.png)](https://i.stack.imgur.com/uAhs8.png)


#### Listing 1

```lang-py
&quot;&quot;&quot;
Object `TreeNode` is a subclass of `anytree.AnyNode`. The object has 
attributes:
  - `index`: index number (row number)
  - `lvl`: hierarchy level 
  - `parent`: The parent TreeNode of the object.

Parameter `tn_list` is the list of TreeNode objects
Parameter `parent_node` is the root node of the tree. It is a TreeNode object.
Parameter `current_level` keeps up with the current &quot;area&quot; in the tree we are 
           processing. 
		   [??] I&#39;m not sure if this will be the parent&#39;s level, current 
		        node&#39;s level, etc.
Parameter `children` is a list of TreeNode objects which are children of 
          `parent_node`.
		  [??] I don&#39;t think I need this.
&quot;&quot;&quot;

def link_TreeNodes(tn_list=None, parent_node=None, children=None, 
                   current_level=None):
	temp_list = []
	children_nodes = []
	
	# Last node&#39;s lvl to determine when we need to stop and add all the 
	# children to the parent.
	last_lvl = None
	
	for node in temp_list:
		lvl = getattr(node, &#39;lvl&#39;)
		if last_lvl is None:
			last_lvl = lvl
		else:
			last_lvl = lvl - 1
		
		if lvl == current_level:
			node.parent = parent_node
		elif lvl == current_level + 1:
			children_nodes.append(node)
		elif lvl == current_level - 1:
			# [??] I&#39;m a lost as to what should happen here...
			print(f&#39;current_level: {current_level}; lvl: {lvl}&#39;)
		
		# Determine if we need to stop
		if last_lvl == lvl:
			stop = False
			continue
		elif last_lvl == lvl + 1:
			print(&#39;Stopping to process children nodes...&#39;)
			stop = True
		
		# We have detected a change in the table
		if stop:
			if children_nodes:
				parent_node = link_TreeNodes(
					tn_list=children_nodes, parent_node=parent_node, 
					current_level=parent_node.lvl+1)
			else:
				return parent_node
			stop = False
	
	# [??] What is my end condition? What do I return from the recursion?
	return parent_node

Edit 1A: Test data as Python dictionary:

test_data = {&#39;index&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], &#39;level&#39;: [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], &#39;part_number&#39;: [&#39;101&#39;, &#39;101-1&#39;, &#39;101-1A&#39;, &#39;101-1B&#39;, &#39;101-1C&#39;, &#39;101-1D&#39;, &#39;101-2&#39;, &#39;101-2A&#39;, &#39;101-2B&#39;, &#39;101-2C&#39;]}

Edit 1B: Output data as nested Python dictionaries:

correct_output = {
	&#39;101&#39;: {&#39;parent&#39;: None, &#39;children&#39;: [&#39;101-1&#39;, &#39;101-2&#39;]},
	&#39;101-1&#39;: {&#39;parent&#39;: &#39;101&#39;, &#39;children&#39;: [&#39;101-1A&#39;, &#39;101-1B&#39;, &#39;101-1C&#39;, &#39;101-1D&#39;]},
	&#39;101-1A&#39;: {&#39;parent&#39;: &#39;101-1&#39;, &#39;children&#39;: None},
	&#39;101-1B&#39;: {&#39;parent&#39;: &#39;101-1&#39;, &#39;children&#39;: None},
	&#39;101-1C&#39;: {&#39;parent&#39;: &#39;101-1&#39;, &#39;children&#39;: None},
	&#39;101-1D&#39;: {&#39;parent&#39;: &#39;101-1&#39;, &#39;children&#39;: None},
	&#39;101-2&#39;: {&#39;parent&#39;: &#39;101&#39;, &#39;children&#39;: [&#39;101-2A&#39;, &#39;101-2B&#39;, &#39;101-2C&#39;]},
	&#39;101-2A&#39;: {&#39;parent&#39;: &#39;101-2&#39;, &#39;children&#39;: None},
	&#39;101-2B&#39;: {&#39;parent&#39;: &#39;101-2&#39;, &#39;children&#39;: None},
	&#39;101-2C&#39;: {&#39;parent&#39;: &#39;101-2&#39;, &#39;children&#39;: None}
}

Edit 1C: Calling Code

Assume the above input data is in the file fn as plain .csv data.

def data_reader(fn):
	data = {}
	with open(fn, newline=&#39;&#39;) as csvfile:
		creader = csv.reader(csvfile, delimiter=&#39;\t&#39;, quotechar=&#39;&quot;&#39;)
		for i, row in enumerate(creader):
			if i == 0:
				continue
			data[int(row[0])] = row[1:]
	return data

def convert2TreeNode(data_dict):
	nodes = []
	for k, v in data_dict.items():
		nodes.append(TreeNode.list2TreeNode(dict_key=k, dict_vals=v))
	return nodes

# if __name__ == &#39;__main__&#39;:
nodes = convert2TreeNode(data_reader(fn))

答案1

得分: 1

以下是使用pandas的解决方案,无需递归,只使用part_number列(我假设级别始终在-或后面跟随大写字母时拆分):

import pandas as pd
import re
from pprint import pprint

test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}

df = pd.DataFrame(test_data)[['part_number']]

def find_parent(pn):
    pn_rev = pn[::-1] # no rsplit in re module
    parent_rev = re.split(r'(?<=[A-Z])|-', pn_rev, maxsplit=1)[-1]
    parent = parent_rev[::-1]
    if parent == pn:
        return None
    else:
        return parent

def find_children(x):
    if y:= df.loc[df['parent']==x, 'part_number'].to_list():
        return y
    else:
        return None

df['parent'] = df['part_number'].apply(find_parent)
df['children'] = df['part_number'].apply(find_children)

pprint(df.set_index('part_number').to_dict(orient='index'))

输出:

{'101': {'children': ['101-1', '101-2'], 'parent': None},
 '101-1': {'children': ['101-1A', '101-1B', '101-1C', '101-1D'], 'parent': '101'},
 '101-1A': {'children': None, 'parent': '101-1'},
 '101-1B': {'children': None, 'parent': '101-1'},
 '101-1C': {'children': None, 'parent': '101-1'},
 '101-1D': {'children': None, 'parent': '101-1'},
 '101-2': {'children': ['101-2A', '101-2B', '101-2C'], 'parent': '101'},
 '101-2A': {'children': None, 'parent': '101-2'},
 '101-2B': {'children': None, 'parent': '101-2'},
 '101-2C': {'children': None, 'parent': '101-2'}}
英文:

Here a solution using pandas, without recursion and using only the part_number column (I assumed that the levels always split at - or when followed by an uppercase letter):

import pandas as pd
import re
from pprint import pprint

test_data = {&#39;index&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], &#39;level&#39;: [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], &#39;part_number&#39;: [&#39;101&#39;, &#39;101-1&#39;, &#39;101-1A&#39;, &#39;101-1B&#39;, &#39;101-1C&#39;, &#39;101-1D&#39;, &#39;101-2&#39;, &#39;101-2A&#39;, &#39;101-2B&#39;, &#39;101-2C&#39;]}

df = pd.DataFrame(test_data)[[&#39;part_number&#39;]]

def find_parent(pn):
    pn_rev = pn[::-1] # no rsplit in re module
    parent_rev = re.split(r&#39;(?&lt;=[A-Z])|-&#39;, pn_rev, maxsplit=1)[-1]
    parent = parent_rev[::-1]
    if parent == pn:
        return None
    else:
        return parent

def find_children(x):
    if y:= df.loc[df[&#39;parent&#39;]==x, &#39;part_number&#39;].to_list():
        return y
    else:
        return None

df[&#39;parent&#39;] = df[&#39;part_number&#39;].apply(find_parent)
df[&#39;children&#39;] = df[&#39;part_number&#39;].apply(find_children)

pprint(df.set_index(&#39;part_number&#39;).to_dict(orient=&#39;index&#39;))

Output:

{&#39;101&#39;: {&#39;children&#39;: [&#39;101-1&#39;, &#39;101-2&#39;], &#39;parent&#39;: None},
 &#39;101-1&#39;: {&#39;children&#39;: [&#39;101-1A&#39;, &#39;101-1B&#39;, &#39;101-1C&#39;, &#39;101-1D&#39;], &#39;parent&#39;: &#39;101&#39;},
 &#39;101-1A&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-1&#39;},
 &#39;101-1B&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-1&#39;},
 &#39;101-1C&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-1&#39;},
 &#39;101-1D&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-1&#39;},
 &#39;101-2&#39;: {&#39;children&#39;: [&#39;101-2A&#39;, &#39;101-2B&#39;, &#39;101-2C&#39;], &#39;parent&#39;: &#39;101&#39;},
 &#39;101-2A&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-2&#39;},
 &#39;101-2B&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-2&#39;},
 &#39;101-2C&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-2&#39;}}

huangapple
  • 本文由 发表于 2023年7月27日 21:51:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780439.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定