Getting an error whenever the string is too long while passing it back to go from python script with cmd.Output()

huangapple go评论78阅读模式
英文:

Getting an error whenever the string is too long while passing it back to go from python script with cmd.Output()

问题

我正在解析一个 PDF 文件,并将文本字符串发送回 Golang 服务器。当我用较小的 PDF 文件运行代码时,它可以正常工作,但是对于较大的 PDF 文件,它返回 exit status 1

以下是我使用的代码:

func parsePdf(path string) string {
    cmd := exec.Command("python", "pdf_parser.py", path)
    output, err := cmd.Output() // 这一行会报错
    if err != nil {
        fmt.Println(err)
    }
    f, _ := os.Create("go-pdf-output.txt")
    _, err := f.WriteString(string(output))
    if err != nil {
        fmt.Println(err2)
    }
    return string(output)
}

这是我从 cmd.Err 得到的错误信息:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0xfc00e6]

这是我的 Python 脚本,在解析后打印字符串:

import fitz
import sys

path = sys.argv[1]
doc = fitz.open(path)
list = []

for page in doc:
    text = page.get_text("text")
    list.append(text)

outputString = ' '.join(list)
print(outputString)

如果我单独运行 Python 脚本,它可以完美地工作。错误发生在这一行 output, err := cmd.Output()。如果 PDF 文件很小,它可以正常工作,但如果 PDF 文件较大(例如一本书的 PDF),它就会失败。

我认为错误是 cmd.Output() 可以返回的字节数的大小。有没有更好的方法将数据从 Python 脚本传输到 Golang?

英文:

I am parsing a pdf file with python and sending the text string back to golang server. When I run the code with smaller pdf file it works properly but with large pdf files it returns exit status 1

Here is the code i am using:

func parsePdf(path string) string {
    cmd := exec.Command("python", "pdf_parser.py", path)
    output, err := cmd.Output() //this line throws error
    if err != nil {
        fmt.Println(err)
    }
    f, _ := os.Create("go-pdf-output.txt")
    _, err := f.WriteString(string(output))
    if err != nil {
        fmt.Println(err2)
    }
    return string(output)
}

This is the err I get from cmd.Err

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0xfc00e6]

This is my python script where I print the string after parsing:

import fitz
import sys

path = sys.argv[1]
doc = fitz.open(path)
list = []

for page in doc:
    text = page.get_text("text")
    list.append(text)

outputString= ' '.join(list)
print(outputString)

If I run the python script seperately it works perfectly. Error is thrown at this line output, err := cmd.Output() If the pdf file is small it works fine but if the pdf file is larger (ex: a book pdf) it fails.

I think the error is the size of bytes that the cmd.Output() can return. Is there any better way to transfer the data from python script to golang.

答案1

得分: 0

我自己解决了。不是直接打印outputString,而是打印json.dumps()。下面是完整的代码:

main.go 文件

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "log"
    "os"
    "os/exec"
)

type ParseText struct {
    Text string `json:"text"`
}

func main() {
    fmt.Println("Running...")

    pdfPath := "./Y2V7 Full With SS-2.pdf"
    _, err := parsePdf(pdfPath)
    if err != nil {
        fmt.Println(err)
    }
}

func parsePdf(path string) (string, error) {
    cmd := exec.Command("python", "pdf_parser.py", path)
    var stdout, stderr bytes.Buffer

    cmd.Stdout = &stdout
    cmd.Stderr = &stderr
    err := cmd.Run()
    if err != nil {
        log.Printf("执行 python 时出错:%s\n", stderr.Bytes())
        return "", fmt.Errorf("执行 python 出错:%w", err)
    }

    res := ParseText{}
    err = json.Unmarshal(stdout.Bytes(), &res)
    writeToFile("go-pdf.txt", res.Text)
    return res.Text, err
}
func writeToFile(fileName, text string) {
    f, err := os.Create(fileName)

    if err != nil {
        log.Fatal(err)
    }

    defer f.Close()

    _, err2 := f.WriteString(text)

    if err2 != nil {
        log.Fatal(err2)
    }
}

pdf-parser.py 文件

import fitz
import sys
import json

URL = sys.argv[1]
doc = fitz.open(URL)
list = []

for page in doc:
    text = page.get_text("text")
    list.append(text)

outputString= ' '.join(list)
print(json.dumps({"text":outputString}))
英文:

I solved it on my own. It's simple instead of printing the outputString directly, print a json.dumps(). I'll provide the whole code below:

main.go file

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"log"
	"os"
	"os/exec"
)

type ParseText struct {
	Text string `json:"text"`
}

func main() {
	fmt.Println("Running...")

	pdfPath := "./Y2V7 Full With SS-2.pdf"
	_, err := parsePdf(pdfPath)
	if err != nil {
		fmt.Println(err)
	}
}

func parsePdf(path string) (string, error) {
	cmd := exec.Command("python", "pdf_parser.py", path)
	var stdout, stderr bytes.Buffer

	cmd.Stdout = &stdout
	cmd.Stderr = &stderr
	err := cmd.Run()
	if err != nil {
		log.Printf("Error when executing python: %s\n", stderr.Bytes())
		return "", fmt.Errorf("Error executing python: %w", err)
	}

	res := ParseText{}
	err = json.Unmarshal(stdout.Bytes(), &res)
	writeToFile("go-pdf.txt", res.Text)
	return res.Text, err
}
func writeToFile(fileName, text string) {
	f, err := os.Create(fileName)

	if err != nil {
		log.Fatal(err)
	}

	defer f.Close()

	_, err2 := f.WriteString(text)

	if err2 != nil {
		log.Fatal(err2)
	}
}

pdf-parser.py file

import fitz
import sys
import json

URL = sys.argv[1]
doc = fitz.open(URL)
list = []

for page in doc:
    text = page.get_text("text")
    list.append(text)

outputString= ' '.join(list)
print(json.dumps({"text":outputString}))

huangapple
  • 本文由 发表于 2022年11月10日 12:24:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/74384126.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定