英文:
Is PYPDF2 the best way to find the location where text occurs on a page?
问题
I'm wanting to write program that searches for 'my_text' on a PDF page. I want the program to return the coordinates where my_text occurs. I'd also like to do the difference: find all text within given coordinates of a specific PDF page. Is PYPDF2 the best way to do this? If so, how? If not, what is a better module?
ChatGPT looks like is using an old version of PYPDF2 and is recommending using x, y, width, height = page.searchFor(my_text)[0], but I'm getting an attribute error 'PageObject' object has no attribute 'searchFor'
英文:
I'm wanting to write program that searches for 'my_text' on a PDF page. I want the program to return the coordinates where my_text occurs. I'd also like to do the difference: find all text within given coordinates of a specific PDF page. Is PYPDF2 the best way to do this? If so, how? If not, what is a better module?
ChatGPT looks like is using an old version of PYPDF2 and is recommending using x, y, width, height = page.searchFor(my_text)[0], but I'm getting an attribute error 'PageObject' object has no attribute 'searchFor'
答案1
得分: 1
我们已经从PyPDF2迁移到了PyMuPDF,后者具有很好的搜索功能。最近PyPDF2有一些更新和活动,但自2016年以来发展不够稳定。而PyMuPDF则是最近更新的。
链接:https://pymupdf.readthedocs.io/en/latest/recipes-text.html
英文:
Hi so we have migrated away from PyPDF2 to PyMuPDF which has a pretty good search for function. There has been some updates/activity around PyPDF2 recently but it has not had consistent development since 2016 or so. PyMuPDF has been more recently updated.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论