英文:
Find all the texts which is 'Normal' style and font size is NOT 11 in a docx file using python-docx
问题
以下是您的代码的翻译部分:
从docx.api导入Document
导入pandas as pd
从docx.shared导入Pt
texts = []
sizes = []
document = Document('new.docx')
for p in document.paragraphs:
for run in p.runs:
if p.style.name.startswith("Normal") and run.font.size != Pt(11):
texts.append(run.text)
print(texts)
请注意,此代码段的输出看起来似乎正确,但某些输出是不正确的。不正确的意思是我还获取了正常样式且字体大小为11的输出。这是否是正确的实现,还是有其他方法可以实现此目标?谢谢!
英文:
My implementation so far:
from docx.api import Document
import pandas as pd
from docx.shared import Pt
texts = []
sizes = []
document = Document('new.docx')
for p in document.paragraphs:
for run in p.runs:
if p.style.name.startswith("Normal") and run.font.size != Pt(11):
texts.append(run.text)
print(texts)
This seems to give the output but some outputs are incorrect. By incorrect I mean I am also getting output which is Normal style and font size is 11. Is this the correct implementation or is there any other way to achieve this? TIA!
答案1
得分: 2
我学到的是,默认情况下,样式存储在.docx文件的另一部分。只有在某种条件下,才能提取样式设置。如果该设置与段落应用的默认样式设置(例如Normal、No Spacing、Heading 1、Title等)不同,Word会将其与文本一起存储。
另一个StackOverflow问题帖子,以更好地理解:
链接
例如,如果您的Word中“Heading 1”的默认字体大小为20pt,而您的文本也是20pt,则无法提取它。但如果它是其他值,它将由您的代码返回。
英文:
Explanation
What I learned is that styles are stored in another part of the .docx files by default. A style setting can be extracted in one condition. If that setting differs from the default style settings (e.g., Normal, No Spacing, Heading 1, Title, etc.) applied to the paragraph. In this case, Word stores it with the text.
Another StackOverflow question thread for a better understanding:
link
Example
E.g., If your Word's default font size for the "Heading 1" is 20pt, and your text is 20pt, you won't be able to extract it. But if it is something else, it will return by your code.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论