英文:
How to extract plain text from PDF in golang
问题
我想使用Go从PDF文件中提取文本。我尝试使用ledongthuc/pdf Go包,该包实现了GetPlainText()方法,用于获取无格式的纯文本内容。但是我没有得到纯文本。我的结果如下:
W
S
D
V
Y R
O
R
Q
W
D
L
U
H
P
H
Q
W
......
Go代码
package main
import (
"bytes"
"fmt"
"github.com/ledongthuc/pdf"
)
func main() {
content, err := readPdf("test.pdf")
if err != nil {
panic(err)
}
fmt.Println(content)
return
}
func readPdf(path string) (string, error) {
r, err := pdf.Open(path)
if err != nil {
return "", err
}
totalPage := r.NumPage()
var textBuilder bytes.Buffer
for pageIndex := 1; pageIndex <= totalPage; pageIndex++ {
p := r.Page(pageIndex)
if p.V.IsNull() {
continue
}
textBuilder.WriteString(p.GetPlainText("\n"))
}
return textBuilder.String(), nil
}
英文:
I want to extract text from pdf file using GO.
I tried using ledongthuc/pdf Go package that implement the method GetPlainText() to get plain text content without format.
But I don't get the plain text. I have as a result:
W
S
D
V
Y R
O
R
Q
W
D
L
U
H
P
H
Q
W
......
Go code
package main
import (
"bytes"
"fmt"
"github.com/ledongthuc/pdf"
)
func main() {
content, err := readPdf("test.pdf")
if err != nil {
panic(err)
}
fmt.Println(content)
return
}
func readPdf(path string) (string, error) {
r, err := pdf.Open(path)
if err != nil {
return "", err
}
totalPage := r.NumPage()
var textBuilder bytes.Buffer
for pageIndex := 1; pageIndex <= totalPage; pageIndex++ {
p := r.Page(pageIndex)
if p.V.IsNull() {
continue
}
textBuilder.WriteString(p.GetPlainText("\n"))
}
return textBuilder.String(), nil
}
答案1
得分: 2
你可以将消息更改为“Exemple of a pdf document.”,而不是
Ex
a
m
pl
e
of
a
pd
f
doc
u
m
e
nt
.
你需要做的是将textBuilder.WriteString(p.GetPlainText("\n"))
更改为
textBuilder.WriteString(p.GetPlainText(""))
希望这可以帮到你。
英文:
You can have a message such as "Exemple of a pdf document." instead of
Ex
a
m
pl
e
of
a
pd
f
doc
u
m
e
nt
.
What you need to do is change the textBuilder.WriteString(p.GetPlainText("\n"))
to
textBuilder.WriteString(p.GetPlainText(""))
I hope this helps.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论