英文:
Reading byte Array from text file and saving as PDF is corrupting the PDF
问题
我需要将一个PDF文件转换成字节数组并保存到一个文本文件中。然后,我需要读取文本文件并恢复PDF文件。
为此,我正在使用以下代码(我正在使用UTF8编码来写文本文件)。
using System.Text;
string SourceFile = @"C:\Files\originalfile.pdf";
string SerializedFile = @"C:\Files\serialized.txt";
string RevivedFile = @"C:\Files\revived.pdf";
SerializeFile(SourceFile, SerializedFile);
DeSerializeFile(SerializedFile, RevivedFile);
// Serialize
static void SerializeFile(string SourceFilePath, string SerializedFilePath)
{
byte[] SourceBytes = File.ReadAllBytes(SourceFilePath);
File.WriteAllText(SerializedFilePath, Encoding.UTF8.GetString(SourceBytes));
Console.WriteLine("Serialized File created");
}
// De-Serialize
static void DeSerializeFile(string SerializedFilePath, string RevivedFilePath)
{
byte[] RevivedBytes;
using (var sr = new StreamReader(SerializedFilePath))
{
RevivedBytes = Encoding.UTF8.GetBytes(sr.ReadToEnd());
}
File.WriteAllBytes(RevivedFilePath, RevivedBytes);
Console.WriteLine("File Revived.");
}
序列化文件成功生成,但第二部分(恢复文件)出现了损坏。
如何正确恢复PDF文件?
英文:
I need to convert a PDF file into byte array and save it in a text file.
Then I need to read the text file and revive the PDF file.
For this I am using the below code (I am using UTF8 for writing the text file).
using System.Text;
string SourceFile = @"C:\\Files\\originalfile.pdf";
string SerializedFile = @"C:\\Files\\serialized.txt";
string RevivedFile = @"C:\\Files\\revived.pdf";
SerializeFile(SourceFile, SerializedFile);
DeSerializeFile(SerializedFile, RevivedFile);
//Serialize
static void SerializeFile(string SourceFilePath, string SerializedFilePath)
{
byte[] SourceBytes = File.ReadAllBytes(SourceFilePath);
File.WriteAllText(SerializedFilePath, Encoding.UTF8.GetString(SourceBytes));
Console.WriteLine("Serialized File created");
}
//De-Serialize
static void DeSerializeFile(string SerializedFilePath, string RevivedFilePath)
{
byte[] RevivedBytes;
using (var sr = new StreamReader(SerializedFilePath))
{
RevivedBytes = Encoding.UTF8.GetBytes(sr.ReadToEnd());
}
File.WriteAllBytes(RevivedFilePath, RevivedBytes);
Console.WriteLine("File Revived.");
}
The serialized file is generating successfully, but the second part (Revived File) is getting corrupted.
How can I correctly restore the PDF file ?
答案1
得分: 3
答案很简单:不要将不透明的二进制数据视为文本。
在SerializeFile
方法中使用File.WriteAllBytes
而不是File.WriteAllText
,在DeSerializeFile
中只使用File.ReadAllBytes
。
你的代码目前 假设 可以将所有数据视为UTF-8编码的文本,在字节和字符串之间进行转换而不会丢失信息。这实际上并非如此,因为并非每个字节序列都是有效的UTF-8数据。
不清楚你为什么 要 将数据转换为文本,考虑到你实际上只是在复制文件。
如果出于某种原因你 实际上 需要将不透明数据表示为文本,应该使用base64或类似的机制,这是一个可逆的转换:字节 -> base64 -> 字节将始终保留所有数据,无论数据是什么。(假设实现正确等,当然。)
英文:
The answer is simple: don't treat opaque binary data as text.
Use File.WriteAllBytes
instead of File.WriteAllText
in your SerializeFile
method, and just use File.ReadAllBytes
in DeSerializeFile
.
Your code currently assumes that you can treat all data as UTF-8-encoded text, converting between bytes and strings with no loss of information. That's simply not the case, because not every byte sequence is valid UTF-8 data.
It's unclear why you even want to convert the data to text, given that you're really just copying files.
If for some reason you actually need to represent opaque data as text, you should use base64 or some similar mechanism, which is a reversible transform: bytes -> base64 -> bytes will always preserve all the data, regardless of what that data is. (Assuming a correct implementation etc, of course.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论