从文本文件读取字节数组并保存为PDF会损坏PDF。

huangapple go评论69阅读模式
英文:

Reading byte Array from text file and saving as PDF is corrupting the PDF

问题

我需要将一个PDF文件转换成字节数组并保存到一个文本文件中。然后,我需要读取文本文件并恢复PDF文件。

为此,我正在使用以下代码(我正在使用UTF8编码来写文本文件)。

using System.Text;

string SourceFile = @"C:\Files\originalfile.pdf";
string SerializedFile = @"C:\Files\serialized.txt";
string RevivedFile = @"C:\Files\revived.pdf";

SerializeFile(SourceFile, SerializedFile);
DeSerializeFile(SerializedFile, RevivedFile);

// Serialize
static void SerializeFile(string SourceFilePath, string SerializedFilePath)
{
    byte[] SourceBytes = File.ReadAllBytes(SourceFilePath);
    File.WriteAllText(SerializedFilePath, Encoding.UTF8.GetString(SourceBytes));
    Console.WriteLine("Serialized File created");
}

// De-Serialize
static void DeSerializeFile(string SerializedFilePath, string RevivedFilePath)
{
    byte[] RevivedBytes;
    using (var sr = new StreamReader(SerializedFilePath))
    {
        RevivedBytes = Encoding.UTF8.GetBytes(sr.ReadToEnd());
    }
    File.WriteAllBytes(RevivedFilePath, RevivedBytes);
    Console.WriteLine("File Revived.");
}

序列化文件成功生成,但第二部分(恢复文件)出现了损坏。

如何正确恢复PDF文件?

英文:

I need to convert a PDF file into byte array and save it in a text file.
Then I need to read the text file and revive the PDF file.

For this I am using the below code (I am using UTF8 for writing the text file).

using System.Text;

string SourceFile = @"C:\\Files\\originalfile.pdf";
string SerializedFile = @"C:\\Files\\serialized.txt";
string RevivedFile = @"C:\\Files\\revived.pdf";

SerializeFile(SourceFile, SerializedFile);
DeSerializeFile(SerializedFile, RevivedFile);

//Serialize
static void SerializeFile(string SourceFilePath, string SerializedFilePath)
{
    byte[] SourceBytes = File.ReadAllBytes(SourceFilePath);
    File.WriteAllText(SerializedFilePath, Encoding.UTF8.GetString(SourceBytes));
    Console.WriteLine("Serialized File created");
}


//De-Serialize
static void DeSerializeFile(string SerializedFilePath, string RevivedFilePath)
{       
    byte[] RevivedBytes;
    using (var sr = new StreamReader(SerializedFilePath))
    {
        RevivedBytes = Encoding.UTF8.GetBytes(sr.ReadToEnd());
    }
    File.WriteAllBytes(RevivedFilePath, RevivedBytes);
    Console.WriteLine("File Revived.");
}

The serialized file is generating successfully, but the second part (Revived File) is getting corrupted.

How can I correctly restore the PDF file ?

答案1

得分: 3

答案很简单:不要将不透明的二进制数据视为文本。

SerializeFile方法中使用File.WriteAllBytes而不是File.WriteAllText,在DeSerializeFile中只使用File.ReadAllBytes

你的代码目前 假设 可以将所有数据视为UTF-8编码的文本,在字节和字符串之间进行转换而不会丢失信息。这实际上并非如此,因为并非每个字节序列都是有效的UTF-8数据。

不清楚你为什么 将数据转换为文本,考虑到你实际上只是在复制文件。

如果出于某种原因你 实际上 需要将不透明数据表示为文本,应该使用base64或类似的机制,这是一个可逆的转换:字节 -> base64 -> 字节将始终保留所有数据,无论数据是什么。(假设实现正确等,当然。)

英文:

The answer is simple: don't treat opaque binary data as text.

Use File.WriteAllBytes instead of File.WriteAllText in your SerializeFile method, and just use File.ReadAllBytes in DeSerializeFile.

Your code currently assumes that you can treat all data as UTF-8-encoded text, converting between bytes and strings with no loss of information. That's simply not the case, because not every byte sequence is valid UTF-8 data.

It's unclear why you even want to convert the data to text, given that you're really just copying files.

If for some reason you actually need to represent opaque data as text, you should use base64 or some similar mechanism, which is a reversible transform: bytes -> base64 -> bytes will always preserve all the data, regardless of what that data is. (Assuming a correct implementation etc, of course.)

huangapple
  • 本文由 发表于 2023年6月6日 15:54:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76412503.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定