Find all the texts which is 'Normal' style and font size is NOT 11 in a docx file using python-docx

huangapple go评论75阅读模式
英文:

Find all the texts which is 'Normal' style and font size is NOT 11 in a docx file using python-docx

问题

以下是您的代码的翻译部分:

从docx.api导入Document
导入pandas as pd
从docx.shared导入Pt

texts = []
sizes = []
document = Document('new.docx')
for p in document.paragraphs:
    for run in p.runs:
        if p.style.name.startswith("Normal") and run.font.size != Pt(11):
            texts.append(run.text)
print(texts)

请注意,此代码段的输出看起来似乎正确,但某些输出是不正确的。不正确的意思是我还获取了正常样式且字体大小为11的输出。这是否是正确的实现,还是有其他方法可以实现此目标?谢谢!

英文:

My implementation so far:

from docx.api import Document
import pandas as pd
from docx.shared import Pt

texts = []
sizes = []
document = Document('new.docx')
for p in document.paragraphs:
    for run in p.runs:
        if p.style.name.startswith("Normal") and run.font.size != Pt(11):
            texts.append(run.text)
print(texts)

This seems to give the output but some outputs are incorrect. By incorrect I mean I am also getting output which is Normal style and font size is 11. Is this the correct implementation or is there any other way to achieve this? TIA!

答案1

得分: 2

我学到的是,默认情况下,样式存储在.docx文件的另一部分。只有在某种条件下,才能提取样式设置。如果该设置与段落应用的默认样式设置(例如Normal、No Spacing、Heading 1、Title等)不同,Word会将其与文本一起存储。

另一个StackOverflow问题帖子,以更好地理解:
链接

例如,如果您的Word中“Heading 1”的默认字体大小为20pt,而您的文本也是20pt,则无法提取它。但如果它是其他值,它将由您的代码返回。

英文:

Explanation

What I learned is that styles are stored in another part of the .docx files by default. A style setting can be extracted in one condition. If that setting differs from the default style settings (e.g., Normal, No Spacing, Heading 1, Title, etc.) applied to the paragraph. In this case, Word stores it with the text.

Another StackOverflow question thread for a better understanding:
link

Example

E.g., If your Word's default font size for the "Heading 1" is 20pt, and your text is 20pt, you won't be able to extract it. But if it is something else, it will return by your code.

huangapple
  • 本文由 发表于 2023年7月10日 13:25:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76650844.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定