英文:
My program reads my file, coded using huffman encoding wrong! Most of the bytes end up being '11111101', even though they're not. Why is that?
问题
I'm currently working on my Huffman decoder in Java. I've been stuck on wrong decoding for a few days now, and only yesterday I realized it's because my program reads input wrong! I've downloaded outside software to read files bit by bit, and it ends up being completely different from what they should be. Please, could you help me resolve this issue?
Here's my code in Java that I use to test what input bits I read:
public static void main(String[] args) {
...
try {
FileReader plikin = new FileReader(property);
BufferedReader pinh = new BufferedReader(plikin);
FileWriter plikout = new FileWriter("out.txt");
BufferedWriter pout = new BufferedWriter(plikout);
printRemainingBits(pinh);
pout.close();
}
...
}
static void printRemainingBits(BufferedReader pinh) throws IOException {
System.out.println("\nRemaining Bits:");
int c;
while ((c = pinh.read()) != -1) {
printCharBits((char) c);
}
}
public static void printCharBits(char c) {
for (int i = 7; i >= 0; i--) {
int bit = (c >> i) & 1;
System.out.print(bit);
}
System.out.print(" ");
}
In my file, I'm testing it on the correct bit representation is:
00000100 10011011 01110011 00101001 11100110
01110010 00110111 11100100 01111101 01111111
But Java code reads them like that:
00000100 11111101 01110011 00101001 11111101
01110010 00110111 11111101 01111101 01111111
I really don't know what to do. Thank you in advance!
英文:
Im currently working on my huffman decoder in java. I've been stuck on wrong decoding for few days now, and only yesterday I realised it's bacause my program reads input wrong! I've downloaded outside software to read files bit by bit, and it end up being completly different, from what they should be. Please could you help me resolve this issue?
Heres my code in java, i use to test what input bites I read:
public static void main(String[] args) {
...
try {
FileReader plikin = new FileReader(property);
BufferedReader pinh = new BufferedReader(plikin);
FileWriter plikout = new FileWriter("out.txt");
BufferedWriter pout = new BufferedWriter(plikout);
printRemainingBits(pinh);
pout.close();
}
...
}
static void printRemainingBits(BufferedReader pinh) throws IOException {
System.out.println("\nRemaining Bits:");
int c;
while ((c = pinh.read()) != -1) {
printCharBits((char) c);
}
}
public static void printCharBits(char c) {
for (int i = 7; i >= 0; i--) {
int bit = (c >> i) & 1;
System.out.print(bit);
}
System.out.print(" ");
}
In my file im testing it on correct bit representation is:
00000100 10011011 01110011 00101001 11100110
01110010 00110111 11100100 01111101 01111111
But java code reads them like that:
00000100 11111101 01110011 00101001 11111101
01110010 00110111 11111101 01111101 01111111
I really dont know what to do. Thank you in advance!
答案1
得分: 3
你正在将文件读取为文本,但它不是文本。
有一个特殊的Unicode字符:
> U+FFFD � 代替字符,用于替换未知、无法识别或不可表示的字符。
请注意,11111101 是 FD,该值的最低 8 位。
与其使用 FileReader
和 BufferedReader
,你应该使用 FileInputStream
和 BufferedInputStream
。这样,你的二进制数据就不会受到代替字符和其他 Unicode 字符的破坏。
英文:
You are reading your file as if it is text, but it's not text.
There is a special unicode character:
> U+FFFD � REPLACEMENT CHARACTER used to replace an unknown, unrecognized, or unrepresentable character
Note that 11111101 is FD, the lowest 8 bits of that value.
Rather than using FileReader
and BufferedReader
, you should be using FileInputStream
and BufferedInputStream
. That way, your binary data won't be corrupted with replacement characters and other unicode things.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论