PYPDF2是找到文本出现位置的最佳方法吗?

huangapple go评论66阅读模式
英文:

Is PYPDF2 the best way to find the location where text occurs on a page?

问题

I'm wanting to write program that searches for 'my_text' on a PDF page. I want the program to return the coordinates where my_text occurs. I'd also like to do the difference: find all text within given coordinates of a specific PDF page. Is PYPDF2 the best way to do this? If so, how? If not, what is a better module?

ChatGPT looks like is using an old version of PYPDF2 and is recommending using x, y, width, height = page.searchFor(my_text)[0], but I'm getting an attribute error 'PageObject' object has no attribute 'searchFor'

英文:

I'm wanting to write program that searches for 'my_text' on a PDF page. I want the program to return the coordinates where my_text occurs. I'd also like to do the difference: find all text within given coordinates of a specific PDF page. Is PYPDF2 the best way to do this? If so, how? If not, what is a better module?

ChatGPT looks like is using an old version of PYPDF2 and is recommending using x, y, width, height = page.searchFor(my_text)[0], but I'm getting an attribute error 'PageObject' object has no attribute 'searchFor'

答案1

得分: 1

我们已经从PyPDF2迁移到了PyMuPDF,后者具有很好的搜索功能。最近PyPDF2有一些更新和活动,但自2016年以来发展不够稳定。而PyMuPDF则是最近更新的。

链接:https://pymupdf.readthedocs.io/en/latest/recipes-text.html

英文:

Hi so we have migrated away from PyPDF2 to PyMuPDF which has a pretty good search for function. There has been some updates/activity around PyPDF2 recently but it has not had consistent development since 2016 or so. PyMuPDF has been more recently updated.

https://pymupdf.readthedocs.io/en/latest/recipes-text.html

huangapple
  • 本文由 发表于 2023年4月20日 00:54:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76057054.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定