2023年5月17日 16:30:11go评论75阅读模式

英文:

Create table instance not connected to a document with python-docx?

问题

这是与Issue #1190相关的X-Post，位于python-docx上游。

这是我当前在docx文档中创建表格的方式。

doc = docx.Document()
tab = doc.add_table(rows=300, cols=5)

因此，表格对象与其父文档"连接"在一起。

是否有一种方式可以创建一个表格对象，而不需要与父对象连接，然后稍后将其添加到文档中？类似这样的方式是否可行？

doc = docx.Document()
tab_one = docx.Table(rows=300, cols=5)
tab_two = docx.Table(rows=100, cols=3)

doc.add_table(tab_two)
doc.add_table(tab_one)

或者（作为一种解决方法），我是否可以像这样将一个表格对象从一个文档实例移动到另一个文档实例？

doc_temp = docx.Document()
tab = doc_temp.add_table(rows=300, cols=5)

doc_main = docx.Document()
doc_main.add_table(tab)

我提出这个问题的背景是，我创建了多个具有100-300行的表格，并对每个单元格进行格式化操作。因此，需要进行大量的行和单元格迭代，这会消耗大量性能和时间。

在多进程中执行此操作，其中每个工作进程都有自己的表格对象，将加速此过程。我想并行创建多个表格，然后稍后将它们添加到文档中。

明显地，多进程并不是性能问题的整个和最佳解决方案。增加更多的CPU资源并不能解决这种问题。算法本身应该得到优化。对我来说，多进程只是通向更好解决方案的一步。

编辑：作为一个现实世界的示例，在这里你可以看到我如何基于pandas.DataFrame对象创建docx表格。

英文:

This is an X-Post related to Issue #1190 at python-docx upstream.

This is how I do create a table in a docx document currently.

doc = docx.Document()
tab = doc.add_table(rows=300, cols=5)

So the table object is "connected" to its parent document.

Is there a way to create a table object without having a connection to a parent object and add it to the document later? Somehow like this?

doc = docx.Document()
tab_one = docx.Table(rows=300, cols=5)
tab_two = docx.Table(rows=100, cols=3)

doc.add_table(tab_two)
doc.add_table(tab_one)

Or (as a workaround) can I move a table object from one document instance to another like this?

doc_temp = docx.Document()
tab = doc_temp.add_table(rows=300, cols=5)

doc_main = docx.Document()
doc_main.add_table(tab)

The background of my question is that I do create multiple tables with 100-300 rows and do formatting operations on each of its cells. So there is a lot of row and cell iterations going on which eat a lot of performance and time.

Doing this in multiprocessing where each worker has its own table object would speed up the process. I would like to create multiple tables in parallel and adding them to the document in a later step.

It is also clear that multiprocessing isn't the whole and best solution for a performance problem. Such a problem isn't solved just with adding more CPU resources into it. The algorithm itself should be optimized. For me the multiprocessing is just one step of the way to a better solution.

EDIT: As a real world example here you can see how I create docx-tables based on pandas.DataFrame objects.

答案1

得分: 1

不，不在python-docx API级别上。通过API操纵的文档元素对象保持对整体连接的lxml对象图的引用，表示一个包部分（如文档、页眉等），并允许您在原地_编辑_该部分。它们不是可以组合和重新组合的独立组件。

也就是说，lxml允许你所说的操作，将预先形成的XML子树的副本插入到现有XML树的任意位置。因此，如果你愿意深入挖掘，你可以使用python-docx形成初始表格子树，然后使用lxml复制它并插入到XML的其他位置。

python-docx也可以帮助你定位它们的位置，至少让你接近目标。例如，你可以找到一个标记段落并获取它的<p>元素，使用paragraph._p，然后使用lxml将<tbl>子树插入到<p>元素之前或之后作为同级。

这对于不怕吃苦的人来说，因为它需要深入研究代码并理解lxml，但这绝对是一个可行的方法，已经为其他人工作过，比起从.docx文件开始完成相同的工作要容易得多。挑战的一部分是处理.docx文件的解组，每个部分都有自己的XML树，而python-docx可以为您处理所有这些，以及在进行更改后重新组合包（保存）。

英文:

No, not at the python-docx API level. The document-element objects you manipulate via the API maintain a reference into the overall connected graph of lxml objects representing a package part (like document, header, etc.) and allow you to edit that part, in situ. They are not free-standing components that you can compose and re-compose.

That said, lxml does allow what you're talking about, inserting a copy of a pre-formed XML subtree into an arbitrary position in an existing XML tree. So if you're willing to dig down, you could use python-docx to form the initial table subtree, then use lxml to copy it and insert it into other places in the XML.

python-docx can also help you locate where those go, at least getting you close. For example you could find a marker paragraph and get its <p> element using paragraph._p then use lxml to insert a <tbl> subtree as a sibling before or after the <p> element.

This is not for the faint of heart as it will require digging into the code and understanding lxml, but it's certainly a viable approach that's worked for others and a lot easier than accomplishing the same starting from a .docx file. A big part of the challenge is handling the unmarshalling of the .docx file into parts, each with their own XML tree and python-docx can handle all that for you, as well as re-marshalling the package (saving) once you've made your changes.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Create table instance not connected to a document with python-docx?

问题

答案1

How can I create a responsive UI with buttons, that scales with size changes with PyQt5?

按列对字符串进行排序（不包括数字），并在制作图表时保持顺序。

修改CAN帧中的CRC字段。

如何在pandas数据框中基于前两列创建第三列？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论