英文:
Python - Construct tree structure from table with level numbering
问题
我有一个表格,已经被展开(请参见下面的表1),用来表示一个分层树结构。我需要创建等效的嵌套树结构(请参见下面的图1)。
是否有通用算法或概念可以实现这个?
我尝试过的方法
我正在使用Python 3.11并对anytree
库进行子类化。然而,将其转换为嵌套的Python字典(在anytree
的期望格式中)也是一种解决方案。
我在下面的清单1中展示了我的代码,但它还很粗糙...我只知道我需要弄清楚lvl
何时更改,然后使用递归将这些对象添加到它们的父级。
我尝试过递归处理它,但我不知道何时递归,要传递什么数据,以及我的返回条件是什么。请查看下面的清单1中的代码。
我的数据结构类似于邻接表,但是否足够相似以使用该代码?
支持数据/信息
表1
索引 | lvl | 零件号 |
---|---|---|
1 | 0 | 101 |
2 | 1 | 101-1 |
3 | 2 | 101-1A |
4 | 2 | 101-1B |
5 | 2 | 101-1C |
6 | 2 | 101-1D |
7 | 1 | 101-2 |
8 | 2 | 101-2A |
9 | 2 | 101-2B |
10 | 2 | 101-2C |
表1的数据排列如下:
- index: 唯一的行标识符,提供整个表格的全局顺序。
- lvl: 部件在层次结构中的级别。例如,0是根,1是根的子级,依此类推。
- Part #: 部件的名称。
图1
清单1
"""
对象`TreeNode`是`anytree.AnyNode`的子类。该对象具有以下属性:
- `index`:索引号(行号)
- `lvl`:层次级别
- `parent`:对象的父节点。
参数`tn_list`是TreeNode对象的列表
参数`parent_node`是树的根节点。它是一个TreeNode对象。
参数`current_level`跟踪我们正在处理的树中的当前“区域”。
[??] 我不确定这是否将是父级的级别,当前节点的级别等。
参数`children`是TreeNode对象的列表,它们是`parent_node`的子级。
[??] 我认为我不需要这个。
"""
def link_TreeNodes(tn_list=None, parent_node=None, children=None,
current_level=None):
temp_list = []
children_nodes = []
# 最后一个节点的级别以确定何时需要停止并将所有子节点添加到父节点。
last_lvl = None
for node in temp_list:
lvl = getattr(node, 'lvl')
if last_lvl is None:
last_lvl = lvl
else:
last_lvl = lvl - 1
if lvl == current_level:
node.parent = parent_node
elif lvl == current_level + 1:
children_nodes.append(node)
elif lvl == current_level - 1:
# [??] 我对这里应该发生什么感到困惑...
print(f'current_level: {current_level}; lvl: {lvl}')
# 确定我们是否需要停止
if last_lvl == lvl:
stop = False
continue
elif last_lvl == lvl + 1:
print('停止处理子节点...')
stop = True
# 我们检测到表格发生变化
if stop:
if children_nodes:
parent_node = link_TreeNodes(
tn_list=children_nodes, parent_node=parent_node,
current_level=parent_node.lvl+1)
else:
return parent_node
stop = False
# [??] 我的终止条件是什么?我从递归中返回什么?
return parent_node
编辑1A:测试数据作为Python字典:
test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}
编辑1B:将输出数据转换为嵌套的Python字典:
correct_output = {
'101': {'parent': None, 'children': ['101-1', '101-2']},
'101-1': {'parent': '101', 'children': ['101-1A', '101-1B', '101-1C', '101-1D']},
'101-1A': {'parent': '101-1', 'children': None},
'101-1B': {'parent': '101-1', 'children': None},
'101-1C': {'parent': '101-1', 'children': None},
'101-1D': {'parent': '101-1', 'children': None},
'101-2': {'parent': '101', 'children': ['101-2A', '101-2B
<details>
<summary>英文:</summary>
I have a table which has been flattened (see **Table 1**, below) to represent a hierarchical tree. I need to create the equivalent nested tree (see **Figure 1**, below).
Is there a general algorithm or concept for doing this?
### What I've Tried
I'm working in Python 3.11 and subclassing the `anytree` library. However, converting to a nested Python dictionary (in `anytree`'s [expected format](https://anytree.readthedocs.io/en/latest/importer/dictimporter.html)) would also be a solution.
I have my code shown below in **Listing 1**, but its very rough... all I know is I need to figure out when the `lvl` changes and then add those objects to their parent with recursion.
I've tried processing it recursively but I'm lost on when to recurse, what data to pass, and what my return condition is. See code below in **Listing 1**.
My data is structured similar to an [Adjacency List](https://stackoverflow.com/questions/21342596/tree-structure-from-adjacency-list), but is it similar enough to use code for that?
### Supporting Data/Info
#### Table 1
| index | lvl | Part # |
| ----- | --- | ------ |
| 1 | 0 | 101 |
| 2 | 1 | 101-1 |
| 3 | 2 | 101-1A |
| 4 | 2 | 101-1B |
| 5 | 2 | 101-1C |
| 6 | 2 | 101-1D |
| 7 | 1 | 101-2 |
| 8 | 2 | 101-2A |
| 9 | 2 | 101-2B |
| 10 | 2 | 101-2C |
**Table 1** data is arranged as follows:
* **_index_**: A unique row identifier which provides the global order of the entire table.
* **_lvl_**: the part's level in the hierachy. E.g. 0 is the root, 1 is the child of the root, etc.
* **_Part #_**: The name of the part.
#### Figure 1
[![enter image description here](https://i.stack.imgur.com/uAhs8.png)](https://i.stack.imgur.com/uAhs8.png)
#### Listing 1
```lang-py
"""
Object `TreeNode` is a subclass of `anytree.AnyNode`. The object has
attributes:
- `index`: index number (row number)
- `lvl`: hierarchy level
- `parent`: The parent TreeNode of the object.
Parameter `tn_list` is the list of TreeNode objects
Parameter `parent_node` is the root node of the tree. It is a TreeNode object.
Parameter `current_level` keeps up with the current "area" in the tree we are
processing.
[??] I'm not sure if this will be the parent's level, current
node's level, etc.
Parameter `children` is a list of TreeNode objects which are children of
`parent_node`.
[??] I don't think I need this.
"""
def link_TreeNodes(tn_list=None, parent_node=None, children=None,
current_level=None):
temp_list = []
children_nodes = []
# Last node's lvl to determine when we need to stop and add all the
# children to the parent.
last_lvl = None
for node in temp_list:
lvl = getattr(node, 'lvl')
if last_lvl is None:
last_lvl = lvl
else:
last_lvl = lvl - 1
if lvl == current_level:
node.parent = parent_node
elif lvl == current_level + 1:
children_nodes.append(node)
elif lvl == current_level - 1:
# [??] I'm a lost as to what should happen here...
print(f'current_level: {current_level}; lvl: {lvl}')
# Determine if we need to stop
if last_lvl == lvl:
stop = False
continue
elif last_lvl == lvl + 1:
print('Stopping to process children nodes...')
stop = True
# We have detected a change in the table
if stop:
if children_nodes:
parent_node = link_TreeNodes(
tn_list=children_nodes, parent_node=parent_node,
current_level=parent_node.lvl+1)
else:
return parent_node
stop = False
# [??] What is my end condition? What do I return from the recursion?
return parent_node
Edit 1A: Test data as Python dictionary:
test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}
Edit 1B: Output data as nested Python dictionaries:
correct_output = {
'101': {'parent': None, 'children': ['101-1', '101-2']},
'101-1': {'parent': '101', 'children': ['101-1A', '101-1B', '101-1C', '101-1D']},
'101-1A': {'parent': '101-1', 'children': None},
'101-1B': {'parent': '101-1', 'children': None},
'101-1C': {'parent': '101-1', 'children': None},
'101-1D': {'parent': '101-1', 'children': None},
'101-2': {'parent': '101', 'children': ['101-2A', '101-2B', '101-2C']},
'101-2A': {'parent': '101-2', 'children': None},
'101-2B': {'parent': '101-2', 'children': None},
'101-2C': {'parent': '101-2', 'children': None}
}
Edit 1C: Calling Code
Assume the above input data is in the file fn
as plain .csv data.
def data_reader(fn):
data = {}
with open(fn, newline='') as csvfile:
creader = csv.reader(csvfile, delimiter='\t', quotechar='"')
for i, row in enumerate(creader):
if i == 0:
continue
data[int(row[0])] = row[1:]
return data
def convert2TreeNode(data_dict):
nodes = []
for k, v in data_dict.items():
nodes.append(TreeNode.list2TreeNode(dict_key=k, dict_vals=v))
return nodes
# if __name__ == '__main__':
nodes = convert2TreeNode(data_reader(fn))
答案1
得分: 1
以下是使用pandas的解决方案,无需递归,只使用part_number
列(我假设级别始终在-
或后面跟随大写字母时拆分):
import pandas as pd
import re
from pprint import pprint
test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}
df = pd.DataFrame(test_data)[['part_number']]
def find_parent(pn):
pn_rev = pn[::-1] # no rsplit in re module
parent_rev = re.split(r'(?<=[A-Z])|-', pn_rev, maxsplit=1)[-1]
parent = parent_rev[::-1]
if parent == pn:
return None
else:
return parent
def find_children(x):
if y:= df.loc[df['parent']==x, 'part_number'].to_list():
return y
else:
return None
df['parent'] = df['part_number'].apply(find_parent)
df['children'] = df['part_number'].apply(find_children)
pprint(df.set_index('part_number').to_dict(orient='index'))
输出:
{'101': {'children': ['101-1', '101-2'], 'parent': None},
'101-1': {'children': ['101-1A', '101-1B', '101-1C', '101-1D'], 'parent': '101'},
'101-1A': {'children': None, 'parent': '101-1'},
'101-1B': {'children': None, 'parent': '101-1'},
'101-1C': {'children': None, 'parent': '101-1'},
'101-1D': {'children': None, 'parent': '101-1'},
'101-2': {'children': ['101-2A', '101-2B', '101-2C'], 'parent': '101'},
'101-2A': {'children': None, 'parent': '101-2'},
'101-2B': {'children': None, 'parent': '101-2'},
'101-2C': {'children': None, 'parent': '101-2'}}
英文:
Here a solution using pandas, without recursion and using only the part_number
column (I assumed that the levels always split at -
or when followed by an uppercase letter):
import pandas as pd
import re
from pprint import pprint
test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}
df = pd.DataFrame(test_data)[['part_number']]
def find_parent(pn):
pn_rev = pn[::-1] # no rsplit in re module
parent_rev = re.split(r'(?<=[A-Z])|-', pn_rev, maxsplit=1)[-1]
parent = parent_rev[::-1]
if parent == pn:
return None
else:
return parent
def find_children(x):
if y:= df.loc[df['parent']==x, 'part_number'].to_list():
return y
else:
return None
df['parent'] = df['part_number'].apply(find_parent)
df['children'] = df['part_number'].apply(find_children)
pprint(df.set_index('part_number').to_dict(orient='index'))
Output:
{'101': {'children': ['101-1', '101-2'], 'parent': None},
'101-1': {'children': ['101-1A', '101-1B', '101-1C', '101-1D'], 'parent': '101'},
'101-1A': {'children': None, 'parent': '101-1'},
'101-1B': {'children': None, 'parent': '101-1'},
'101-1C': {'children': None, 'parent': '101-1'},
'101-1D': {'children': None, 'parent': '101-1'},
'101-2': {'children': ['101-2A', '101-2B', '101-2C'], 'parent': '101'},
'101-2A': {'children': None, 'parent': '101-2'},
'101-2B': {'children': None, 'parent': '101-2'},
'101-2C': {'children': None, 'parent': '101-2'}}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论