使用IText7将PDF拆分为字节数组页:

huangapple go评论76阅读模式
英文:

Split a Pdf into byte array pages with IText7

问题

我需要将一个Pdf文件拆分成字节数组页,而不使用文件系统。
我找到了@AlexeySubach的下一个代码,似乎可以工作,但我在导出DocumentReadyListener中的内容时遇到了问题

    class ByteArrayPdfSplitter : PdfSplitter {
    
        private MemoryStream currentOutputStream;
    
        public ByteArrayPdfSplitter(PdfDocument pdfDocument) : base(pdfDocument) {
        }
    
        protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
            currentOutputStream = new MemoryStream();
            return new PdfWriter(currentOutputStream);
        }
    
        public MemoryStream CurrentMemoryStream {
            get { return currentOutputStream; }
        }
    
        public class DocumentReadyListender : IDocumentReadyListener {
    
            private ByteArrayPdfSplitter splitter;
    
            public DocumentReadyListender(ByteArrayPdfSplitter splitter) {
                this.splitter = splitter;
            }
    
            public void DocumentReady(PdfDocument pdfDocument, PageRange pageRange) {
                pdfDocument.Close();
                byte[] contents = splitter.CurrentMemoryStream.ToArray();
                String pageNumber = pageRange.ToString();
            }
        }
    }


用法:

        public static List<Byte[]> SplitOnPages(Byte[] bytes)
        {
            using (MemoryStream memoryStream = new MemoryStream(bytes))
            {
                using (PdfReader reader = new PdfReader(memoryStream))
                {
                    PdfDocument docToSplit = new PdfDocument(reader);
                    ByteArrayPdfSplitter splitter = new ByteArrayPdfSplitter(docToSplit);
                    splitter.SplitByPageCount(1, new ByteArrayPdfSplitter.DocumentReadyListender(splitter));
                }
            }

            //我如何在这里获取字节数组页的数组?
            return ...
        }
英文:

I need to split a Pdf file into byte array pages without using the file system.
I found the next code from @AlexeySubach which seems to work, but I have problems to export the contents from DocumentReadyListener:

class ByteArrayPdfSplitter : PdfSplitter {

    private MemoryStream currentOutputStream;

    public ByteArrayPdfSplitter(PdfDocument pdfDocument) : base(pdfDocument) {
    }

    protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
        currentOutputStream = new MemoryStream();
        return new PdfWriter(currentOutputStream);
    }

    public MemoryStream CurrentMemoryStream {
        get { return currentOutputStream; }
    }

    public class DocumentReadyListender : IDocumentReadyListener {

        private ByteArrayPdfSplitter splitter;

        public DocumentReadyListender(ByteArrayPdfSplitter splitter) {
            this.splitter = splitter;
        }

        public void DocumentReady(PdfDocument pdfDocument, PageRange pageRange) {
            pdfDocument.Close();
            byte[] contents = splitter.CurrentMemoryStream.ToArray();
            String pageNumber = pageRange.ToString();
        }
    }
}

Usage:

    public static List<Byte[]> SplitOnPages(Byte[] bytes)
    {
        using (MemoryStream memoryStream = new MemoryStream(bytes))
        {
            using (PdfReader reader = new PdfReader(memoryStream))
            {
                PdfDocument docToSplit = new PdfDocument(reader);
                ByteArrayPdfSplitter splitter = new ByteArrayPdfSplitter(docToSplit);
                splitter.SplitByPageCount(1, new ByteArrayPdfSplitter.DocumentReadyListender(splitter));
            }
        }

        //How do I get here the array of byte array pages??
        return ...
    }

答案1

得分: 1

你找到的来自Alexey Subach的代码期望你在`DocumentReadyListender`方法`DocumentReady`中添加一些明智的操作。由于最终你想要一个结果PDF字节数组的列表,你应该在你的情况下将准备好的文档的字节数组添加到这样一个列表中,比如改进`DocumentReadyListender`如下:

```lang-c#
public class DocumentReadyListender : IDocumentReadyListener
{
    public List<byte[]> splitPdfs;

    private ByteArrayPdfSplitter splitter;

    public DocumentReadyListender(ByteArrayPdfSplitter splitter, List<byte[]> results)
    {
        this.splitter = splitter;
        this.splitPdfs = results;
    }

    public void DocumentReady(PdfDocument pdfDocument, PageRange pageRange)
    {
        pdfDocument.Close();
        byte[] contents = splitter.CurrentMemoryStream.ToArray();
        splitPdfs.Add(contents);
    }
}

(ByteArrayPdfSplitter,改进的辅助类 DocumentReadyListender)

通过这个更改,你可以使SplitOnPages方法正常运行:

public static List<Byte[]> SplitOnPages(Byte[] bytes)
{
    List <byte[]> result = new List<byte[]>();
    using (MemoryStream memoryStream = new MemoryStream(bytes))
    {
        using (PdfReader reader = new PdfReader(memoryStream))
        {
            PdfDocument docToSplit = new PdfDocument(reader);
            ByteArrayPdfSplitter splitter = new ByteArrayPdfSplitter(docToSplit);
            splitter.SplitByPageCount(1, new DocumentReadyListender(splitter, result));
        }
    }

    return result;
}

(SplitInMemory 测试, 改进的方法 SplitOnPages)


<details>
<summary>英文:</summary>

The code from Alexey Subach you found expects that you add some sensible operation in the `DocumentReadyListender` method `DocumentReady`. As you eventually want a list of result PDF bytes, you should in your case add the bytes of the ready document to such a list, e.g. by improving the `DocumentReadyListender` like this:

```lang-c#
public class DocumentReadyListender : IDocumentReadyListener
{
    public List&lt;byte[]&gt; splitPdfs;

    private ByteArrayPdfSplitter splitter;

    public DocumentReadyListender(ByteArrayPdfSplitter splitter, List&lt;byte[]&gt; results)
    {
        this.splitter = splitter;
        this.splitPdfs = results;
    }

    public void DocumentReady(PdfDocument pdfDocument, PageRange pageRange)
    {
        pdfDocument.Close();
        byte[] contents = splitter.CurrentMemoryStream.ToArray();
        splitPdfs.Add(contents);
    }
}

(ByteArrayPdfSplitter, improved helper class DocumentReadyListender)

With that change you can make your SplitOnPages operational:

public static List&lt;Byte[]&gt; SplitOnPages(Byte[] bytes)
{
    List &lt;byte[]&gt; result = new List&lt;byte[]&gt;();
    using (MemoryStream memoryStream = new MemoryStream(bytes))
    {
        using (PdfReader reader = new PdfReader(memoryStream))
        {
            PdfDocument docToSplit = new PdfDocument(reader);
            ByteArrayPdfSplitter splitter = new ByteArrayPdfSplitter(docToSplit);
            splitter.SplitByPageCount(1, new DocumentReadyListender(splitter, result));
        }
    }

    return result;
}

(SplitInMemory test, improved method SplitOnPages)

huangapple
  • 本文由 发表于 2023年6月29日 01:48:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76575599.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定