英文:
Getting an error whenever the string is too long while passing it back to go from python script with cmd.Output()
问题
我正在解析一个 PDF 文件,并将文本字符串发送回 Golang 服务器。当我用较小的 PDF 文件运行代码时,它可以正常工作,但是对于较大的 PDF 文件,它返回 exit status 1
。
以下是我使用的代码:
func parsePdf(path string) string {
cmd := exec.Command("python", "pdf_parser.py", path)
output, err := cmd.Output() // 这一行会报错
if err != nil {
fmt.Println(err)
}
f, _ := os.Create("go-pdf-output.txt")
_, err := f.WriteString(string(output))
if err != nil {
fmt.Println(err2)
}
return string(output)
}
这是我从 cmd.Err
得到的错误信息:
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0xfc00e6]
这是我的 Python 脚本,在解析后打印字符串:
import fitz
import sys
path = sys.argv[1]
doc = fitz.open(path)
list = []
for page in doc:
text = page.get_text("text")
list.append(text)
outputString = ' '.join(list)
print(outputString)
如果我单独运行 Python 脚本,它可以完美地工作。错误发生在这一行 output, err := cmd.Output()
。如果 PDF 文件很小,它可以正常工作,但如果 PDF 文件较大(例如一本书的 PDF),它就会失败。
我认为错误是 cmd.Output()
可以返回的字节数的大小。有没有更好的方法将数据从 Python 脚本传输到 Golang?
英文:
I am parsing a pdf file with python and sending the text string back to golang server. When I run the code with smaller pdf file it works properly but with large pdf files it returns exit status 1
Here is the code i am using:
func parsePdf(path string) string {
cmd := exec.Command("python", "pdf_parser.py", path)
output, err := cmd.Output() //this line throws error
if err != nil {
fmt.Println(err)
}
f, _ := os.Create("go-pdf-output.txt")
_, err := f.WriteString(string(output))
if err != nil {
fmt.Println(err2)
}
return string(output)
}
This is the err I get from cmd.Err
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0xfc00e6]
This is my python script where I print the string after parsing:
import fitz
import sys
path = sys.argv[1]
doc = fitz.open(path)
list = []
for page in doc:
text = page.get_text("text")
list.append(text)
outputString= ' '.join(list)
print(outputString)
If I run the python script seperately it works perfectly. Error is thrown at this line output, err := cmd.Output()
If the pdf file is small it works fine but if the pdf file is larger (ex: a book pdf) it fails.
I think the error is the size of bytes that the cmd.Output()
can return. Is there any better way to transfer the data from python script to golang.
答案1
得分: 0
我自己解决了。不是直接打印outputString
,而是打印json.dumps()
。下面是完整的代码:
main.go 文件
package main
import (
"bytes"
"encoding/json"
"fmt"
"log"
"os"
"os/exec"
)
type ParseText struct {
Text string `json:"text"`
}
func main() {
fmt.Println("Running...")
pdfPath := "./Y2V7 Full With SS-2.pdf"
_, err := parsePdf(pdfPath)
if err != nil {
fmt.Println(err)
}
}
func parsePdf(path string) (string, error) {
cmd := exec.Command("python", "pdf_parser.py", path)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err := cmd.Run()
if err != nil {
log.Printf("执行 python 时出错:%s\n", stderr.Bytes())
return "", fmt.Errorf("执行 python 出错:%w", err)
}
res := ParseText{}
err = json.Unmarshal(stdout.Bytes(), &res)
writeToFile("go-pdf.txt", res.Text)
return res.Text, err
}
func writeToFile(fileName, text string) {
f, err := os.Create(fileName)
if err != nil {
log.Fatal(err)
}
defer f.Close()
_, err2 := f.WriteString(text)
if err2 != nil {
log.Fatal(err2)
}
}
pdf-parser.py 文件
import fitz
import sys
import json
URL = sys.argv[1]
doc = fitz.open(URL)
list = []
for page in doc:
text = page.get_text("text")
list.append(text)
outputString= ' '.join(list)
print(json.dumps({"text":outputString}))
英文:
I solved it on my own. It's simple instead of printing the outputString
directly, print a json.dumps()
. I'll provide the whole code below:
main.go file
package main
import (
"bytes"
"encoding/json"
"fmt"
"log"
"os"
"os/exec"
)
type ParseText struct {
Text string `json:"text"`
}
func main() {
fmt.Println("Running...")
pdfPath := "./Y2V7 Full With SS-2.pdf"
_, err := parsePdf(pdfPath)
if err != nil {
fmt.Println(err)
}
}
func parsePdf(path string) (string, error) {
cmd := exec.Command("python", "pdf_parser.py", path)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err := cmd.Run()
if err != nil {
log.Printf("Error when executing python: %s\n", stderr.Bytes())
return "", fmt.Errorf("Error executing python: %w", err)
}
res := ParseText{}
err = json.Unmarshal(stdout.Bytes(), &res)
writeToFile("go-pdf.txt", res.Text)
return res.Text, err
}
func writeToFile(fileName, text string) {
f, err := os.Create(fileName)
if err != nil {
log.Fatal(err)
}
defer f.Close()
_, err2 := f.WriteString(text)
if err2 != nil {
log.Fatal(err2)
}
}
pdf-parser.py file
import fitz
import sys
import json
URL = sys.argv[1]
doc = fitz.open(URL)
list = []
for page in doc:
text = page.get_text("text")
list.append(text)
outputString= ' '.join(list)
print(json.dumps({"text":outputString}))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论