Python – 从表格构建带有级别编号的树结构

huangapple go评论149阅读模式
英文:

Python - Construct tree structure from table with level numbering

问题

我有一个表格,已经被展开(请参见下面的表1),用来表示一个分层树结构。我需要创建等效的嵌套树结构(请参见下面的图1)。

是否有通用算法或概念可以实现这个?

我尝试过的方法

我正在使用Python 3.11并对anytree库进行子类化。然而,将其转换为嵌套的Python字典(在anytree期望格式中)也是一种解决方案。

我在下面的清单1中展示了我的代码,但它还很粗糙...我只知道我需要弄清楚lvl何时更改,然后使用递归将这些对象添加到它们的父级。

我尝试过递归处理它,但我不知道何时递归,要传递什么数据,以及我的返回条件是什么。请查看下面的清单1中的代码。

我的数据结构类似于邻接表,但是否足够相似以使用该代码?

支持数据/信息

表1

索引 lvl 零件号
1 0 101
2 1 101-1
3 2 101-1A
4 2 101-1B
5 2 101-1C
6 2 101-1D
7 1 101-2
8 2 101-2A
9 2 101-2B
10 2 101-2C

表1的数据排列如下:

  • index: 唯一的行标识符,提供整个表格的全局顺序。
  • lvl: 部件在层次结构中的级别。例如,0是根,1是根的子级,依此类推。
  • Part #: 部件的名称。

图1

Python – 从表格构建带有级别编号的树结构

清单1

  1. """
  2. 对象`TreeNode`是`anytree.AnyNode`的子类。该对象具有以下属性:
  3. - `index`:索引号(行号)
  4. - `lvl`:层次级别
  5. - `parent`:对象的父节点。
  6. 参数`tn_list`是TreeNode对象的列表
  7. 参数`parent_node`是树的根节点。它是一个TreeNode对象。
  8. 参数`current_level`跟踪我们正在处理的树中的当前“区域”。
  9. [??] 我不确定这是否将是父级的级别,当前节点的级别等。
  10. 参数`children`是TreeNode对象的列表,它们是`parent_node`的子级。
  11. [??] 我认为我不需要这个。
  12. """
  13. def link_TreeNodes(tn_list=None, parent_node=None, children=None,
  14. current_level=None):
  15. temp_list = []
  16. children_nodes = []
  17. # 最后一个节点的级别以确定何时需要停止并将所有子节点添加到父节点。
  18. last_lvl = None
  19. for node in temp_list:
  20. lvl = getattr(node, 'lvl')
  21. if last_lvl is None:
  22. last_lvl = lvl
  23. else:
  24. last_lvl = lvl - 1
  25. if lvl == current_level:
  26. node.parent = parent_node
  27. elif lvl == current_level + 1:
  28. children_nodes.append(node)
  29. elif lvl == current_level - 1:
  30. # [??] 我对这里应该发生什么感到困惑...
  31. print(f'current_level: {current_level}; lvl: {lvl}')
  32. # 确定我们是否需要停止
  33. if last_lvl == lvl:
  34. stop = False
  35. continue
  36. elif last_lvl == lvl + 1:
  37. print('停止处理子节点...')
  38. stop = True
  39. # 我们检测到表格发生变化
  40. if stop:
  41. if children_nodes:
  42. parent_node = link_TreeNodes(
  43. tn_list=children_nodes, parent_node=parent_node,
  44. current_level=parent_node.lvl+1)
  45. else:
  46. return parent_node
  47. stop = False
  48. # [??] 我的终止条件是什么?我从递归中返回什么?
  49. return parent_node

编辑1A:测试数据作为Python字典:

  1. test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}

编辑1B:将输出数据转换为嵌套的Python字典:

  1. correct_output = {
  2. '101': {'parent': None, 'children': ['101-1', '101-2']},
  3. '101-1': {'parent': '101', 'children': ['101-1A', '101-1B', '101-1C', '101-1D']},
  4. '101-1A': {'parent': '101-1', 'children': None},
  5. '101-1B': {'parent': '101-1', 'children': None},
  6. '101-1C': {'parent': '101-1', 'children': None},
  7. '101-1D': {'parent': '101-1', 'children': None},
  8. '101-2': {'parent': '101', 'children': ['101-2A', '101-2B
  9. <details>
  10. <summary>英文:</summary>
  11. I have a table which has been flattened (see **Table 1**, below) to represent a hierarchical tree. I need to create the equivalent nested tree (see **Figure 1**, below).
  12. Is there a general algorithm or concept for doing this?
  13. ### What I&#39;ve Tried
  14. I&#39;m working in Python 3.11 and subclassing the `anytree` library. However, converting to a nested Python dictionary (in `anytree`&#39;s [expected format](https://anytree.readthedocs.io/en/latest/importer/dictimporter.html)) would also be a solution.
  15. I have my code shown below in **Listing 1**, but its very rough... all I know is I need to figure out when the `lvl` changes and then add those objects to their parent with recursion.
  16. I&#39;ve tried processing it recursively but I&#39;m lost on when to recurse, what data to pass, and what my return condition is. See code below in **Listing 1**.
  17. My data is structured similar to an [Adjacency List](https://stackoverflow.com/questions/21342596/tree-structure-from-adjacency-list), but is it similar enough to use code for that?
  18. ### Supporting Data/Info
  19. #### Table 1
  20. | index | lvl | Part # |
  21. | ----- | --- | ------ |
  22. | 1 | 0 | 101 |
  23. | 2 | 1 | 101-1 |
  24. | 3 | 2 | 101-1A |
  25. | 4 | 2 | 101-1B |
  26. | 5 | 2 | 101-1C |
  27. | 6 | 2 | 101-1D |
  28. | 7 | 1 | 101-2 |
  29. | 8 | 2 | 101-2A |
  30. | 9 | 2 | 101-2B |
  31. | 10 | 2 | 101-2C |
  32. **Table 1** data is arranged as follows:
  33. * **_index_**: A unique row identifier which provides the global order of the entire table.
  34. * **_lvl_**: the part&#39;s level in the hierachy. E.g. 0 is the root, 1 is the child of the root, etc.
  35. * **_Part #_**: The name of the part.
  36. #### Figure 1
  37. [![enter image description here](https://i.stack.imgur.com/uAhs8.png)](https://i.stack.imgur.com/uAhs8.png)
  38. #### Listing 1
  39. ```lang-py
  40. &quot;&quot;&quot;
  41. Object `TreeNode` is a subclass of `anytree.AnyNode`. The object has
  42. attributes:
  43. - `index`: index number (row number)
  44. - `lvl`: hierarchy level
  45. - `parent`: The parent TreeNode of the object.
  46. Parameter `tn_list` is the list of TreeNode objects
  47. Parameter `parent_node` is the root node of the tree. It is a TreeNode object.
  48. Parameter `current_level` keeps up with the current &quot;area&quot; in the tree we are
  49. processing.
  50. [??] I&#39;m not sure if this will be the parent&#39;s level, current
  51. node&#39;s level, etc.
  52. Parameter `children` is a list of TreeNode objects which are children of
  53. `parent_node`.
  54. [??] I don&#39;t think I need this.
  55. &quot;&quot;&quot;
  56. def link_TreeNodes(tn_list=None, parent_node=None, children=None,
  57. current_level=None):
  58. temp_list = []
  59. children_nodes = []
  60. # Last node&#39;s lvl to determine when we need to stop and add all the
  61. # children to the parent.
  62. last_lvl = None
  63. for node in temp_list:
  64. lvl = getattr(node, &#39;lvl&#39;)
  65. if last_lvl is None:
  66. last_lvl = lvl
  67. else:
  68. last_lvl = lvl - 1
  69. if lvl == current_level:
  70. node.parent = parent_node
  71. elif lvl == current_level + 1:
  72. children_nodes.append(node)
  73. elif lvl == current_level - 1:
  74. # [??] I&#39;m a lost as to what should happen here...
  75. print(f&#39;current_level: {current_level}; lvl: {lvl}&#39;)
  76. # Determine if we need to stop
  77. if last_lvl == lvl:
  78. stop = False
  79. continue
  80. elif last_lvl == lvl + 1:
  81. print(&#39;Stopping to process children nodes...&#39;)
  82. stop = True
  83. # We have detected a change in the table
  84. if stop:
  85. if children_nodes:
  86. parent_node = link_TreeNodes(
  87. tn_list=children_nodes, parent_node=parent_node,
  88. current_level=parent_node.lvl+1)
  89. else:
  90. return parent_node
  91. stop = False
  92. # [??] What is my end condition? What do I return from the recursion?
  93. return parent_node

Edit 1A: Test data as Python dictionary:

  1. test_data = {&#39;index&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], &#39;level&#39;: [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], &#39;part_number&#39;: [&#39;101&#39;, &#39;101-1&#39;, &#39;101-1A&#39;, &#39;101-1B&#39;, &#39;101-1C&#39;, &#39;101-1D&#39;, &#39;101-2&#39;, &#39;101-2A&#39;, &#39;101-2B&#39;, &#39;101-2C&#39;]}

Edit 1B: Output data as nested Python dictionaries:

  1. correct_output = {
  2. &#39;101&#39;: {&#39;parent&#39;: None, &#39;children&#39;: [&#39;101-1&#39;, &#39;101-2&#39;]},
  3. &#39;101-1&#39;: {&#39;parent&#39;: &#39;101&#39;, &#39;children&#39;: [&#39;101-1A&#39;, &#39;101-1B&#39;, &#39;101-1C&#39;, &#39;101-1D&#39;]},
  4. &#39;101-1A&#39;: {&#39;parent&#39;: &#39;101-1&#39;, &#39;children&#39;: None},
  5. &#39;101-1B&#39;: {&#39;parent&#39;: &#39;101-1&#39;, &#39;children&#39;: None},
  6. &#39;101-1C&#39;: {&#39;parent&#39;: &#39;101-1&#39;, &#39;children&#39;: None},
  7. &#39;101-1D&#39;: {&#39;parent&#39;: &#39;101-1&#39;, &#39;children&#39;: None},
  8. &#39;101-2&#39;: {&#39;parent&#39;: &#39;101&#39;, &#39;children&#39;: [&#39;101-2A&#39;, &#39;101-2B&#39;, &#39;101-2C&#39;]},
  9. &#39;101-2A&#39;: {&#39;parent&#39;: &#39;101-2&#39;, &#39;children&#39;: None},
  10. &#39;101-2B&#39;: {&#39;parent&#39;: &#39;101-2&#39;, &#39;children&#39;: None},
  11. &#39;101-2C&#39;: {&#39;parent&#39;: &#39;101-2&#39;, &#39;children&#39;: None}
  12. }

Edit 1C: Calling Code

Assume the above input data is in the file fn as plain .csv data.

  1. def data_reader(fn):
  2. data = {}
  3. with open(fn, newline=&#39;&#39;) as csvfile:
  4. creader = csv.reader(csvfile, delimiter=&#39;\t&#39;, quotechar=&#39;&quot;&#39;)
  5. for i, row in enumerate(creader):
  6. if i == 0:
  7. continue
  8. data[int(row[0])] = row[1:]
  9. return data
  10. def convert2TreeNode(data_dict):
  11. nodes = []
  12. for k, v in data_dict.items():
  13. nodes.append(TreeNode.list2TreeNode(dict_key=k, dict_vals=v))
  14. return nodes
  15. # if __name__ == &#39;__main__&#39;:
  16. nodes = convert2TreeNode(data_reader(fn))

答案1

得分: 1

以下是使用pandas的解决方案,无需递归,只使用part_number列(我假设级别始终在-或后面跟随大写字母时拆分):

  1. import pandas as pd
  2. import re
  3. from pprint import pprint
  4. test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}
  5. df = pd.DataFrame(test_data)[['part_number']]
  6. def find_parent(pn):
  7. pn_rev = pn[::-1] # no rsplit in re module
  8. parent_rev = re.split(r'(?<=[A-Z])|-', pn_rev, maxsplit=1)[-1]
  9. parent = parent_rev[::-1]
  10. if parent == pn:
  11. return None
  12. else:
  13. return parent
  14. def find_children(x):
  15. if y:= df.loc[df['parent']==x, 'part_number'].to_list():
  16. return y
  17. else:
  18. return None
  19. df['parent'] = df['part_number'].apply(find_parent)
  20. df['children'] = df['part_number'].apply(find_children)
  21. pprint(df.set_index('part_number').to_dict(orient='index'))

输出:

  1. {'101': {'children': ['101-1', '101-2'], 'parent': None},
  2. '101-1': {'children': ['101-1A', '101-1B', '101-1C', '101-1D'], 'parent': '101'},
  3. '101-1A': {'children': None, 'parent': '101-1'},
  4. '101-1B': {'children': None, 'parent': '101-1'},
  5. '101-1C': {'children': None, 'parent': '101-1'},
  6. '101-1D': {'children': None, 'parent': '101-1'},
  7. '101-2': {'children': ['101-2A', '101-2B', '101-2C'], 'parent': '101'},
  8. '101-2A': {'children': None, 'parent': '101-2'},
  9. '101-2B': {'children': None, 'parent': '101-2'},
  10. '101-2C': {'children': None, 'parent': '101-2'}}
英文:

Here a solution using pandas, without recursion and using only the part_number column (I assumed that the levels always split at - or when followed by an uppercase letter):

  1. import pandas as pd
  2. import re
  3. from pprint import pprint
  4. test_data = {&#39;index&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], &#39;level&#39;: [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], &#39;part_number&#39;: [&#39;101&#39;, &#39;101-1&#39;, &#39;101-1A&#39;, &#39;101-1B&#39;, &#39;101-1C&#39;, &#39;101-1D&#39;, &#39;101-2&#39;, &#39;101-2A&#39;, &#39;101-2B&#39;, &#39;101-2C&#39;]}
  5. df = pd.DataFrame(test_data)[[&#39;part_number&#39;]]
  6. def find_parent(pn):
  7. pn_rev = pn[::-1] # no rsplit in re module
  8. parent_rev = re.split(r&#39;(?&lt;=[A-Z])|-&#39;, pn_rev, maxsplit=1)[-1]
  9. parent = parent_rev[::-1]
  10. if parent == pn:
  11. return None
  12. else:
  13. return parent
  14. def find_children(x):
  15. if y:= df.loc[df[&#39;parent&#39;]==x, &#39;part_number&#39;].to_list():
  16. return y
  17. else:
  18. return None
  19. df[&#39;parent&#39;] = df[&#39;part_number&#39;].apply(find_parent)
  20. df[&#39;children&#39;] = df[&#39;part_number&#39;].apply(find_children)
  21. pprint(df.set_index(&#39;part_number&#39;).to_dict(orient=&#39;index&#39;))

Output:

  1. {&#39;101&#39;: {&#39;children&#39;: [&#39;101-1&#39;, &#39;101-2&#39;], &#39;parent&#39;: None},
  2. &#39;101-1&#39;: {&#39;children&#39;: [&#39;101-1A&#39;, &#39;101-1B&#39;, &#39;101-1C&#39;, &#39;101-1D&#39;], &#39;parent&#39;: &#39;101&#39;},
  3. &#39;101-1A&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-1&#39;},
  4. &#39;101-1B&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-1&#39;},
  5. &#39;101-1C&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-1&#39;},
  6. &#39;101-1D&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-1&#39;},
  7. &#39;101-2&#39;: {&#39;children&#39;: [&#39;101-2A&#39;, &#39;101-2B&#39;, &#39;101-2C&#39;], &#39;parent&#39;: &#39;101&#39;},
  8. &#39;101-2A&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-2&#39;},
  9. &#39;101-2B&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-2&#39;},
  10. &#39;101-2C&#39;: {&#39;children&#39;: None, &#39;parent&#39;: &#39;101-2&#39;}}

huangapple
  • 本文由 发表于 2023年7月27日 21:51:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780439.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定