My program reads my file, coded using huffman encoding wrong! Most of the bytes end up being '11111101', even though they're not. Why is that?

huangapple go评论73阅读模式
英文:

My program reads my file, coded using huffman encoding wrong! Most of the bytes end up being '11111101', even though they're not. Why is that?

问题

I'm currently working on my Huffman decoder in Java. I've been stuck on wrong decoding for a few days now, and only yesterday I realized it's because my program reads input wrong! I've downloaded outside software to read files bit by bit, and it ends up being completely different from what they should be. Please, could you help me resolve this issue?

Here's my code in Java that I use to test what input bits I read:

public static void main(String[] args) {
    ...
    try {
        FileReader plikin = new FileReader(property);
        BufferedReader pinh = new BufferedReader(plikin);
        FileWriter plikout = new FileWriter("out.txt");
        BufferedWriter pout = new BufferedWriter(plikout);
        printRemainingBits(pinh);
        pout.close();
    }
    ...
}

static void printRemainingBits(BufferedReader pinh) throws IOException {
    System.out.println("\nRemaining Bits:");
    int c;
    while ((c = pinh.read()) != -1) {
        printCharBits((char) c);
    }
}

public static void printCharBits(char c) {
    for (int i = 7; i >= 0; i--) {
        int bit = (c >> i) & 1;
        System.out.print(bit);
    }
    System.out.print(" ");
}

In my file, I'm testing it on the correct bit representation is:

00000100 10011011 01110011 00101001 11100110
01110010 00110111 11100100 01111101 01111111

But Java code reads them like that:

00000100 11111101 01110011 00101001 11111101
01110010 00110111 11111101 01111101 01111111

I really don't know what to do. Thank you in advance!

英文:

Im currently working on my huffman decoder in java. I've been stuck on wrong decoding for few days now, and only yesterday I realised it's bacause my program reads input wrong! I've downloaded outside software to read files bit by bit, and it end up being completly different, from what they should be. Please could you help me resolve this issue?

Heres my code in java, i use to test what input bites I read:

public static void main(String[] args) {
        ...
        try {
            FileReader plikin = new FileReader(property);
            BufferedReader pinh = new BufferedReader(plikin);
            FileWriter plikout = new FileWriter("out.txt");
            BufferedWriter pout = new BufferedWriter(plikout);
            printRemainingBits(pinh);
            pout.close();
        }
        ...
    }

static void printRemainingBits(BufferedReader pinh) throws IOException {
        System.out.println("\nRemaining Bits:");
        int c;
        while ((c = pinh.read()) != -1) {
            printCharBits((char) c);
        }
    }

public static void printCharBits(char c) {
        for (int i = 7; i >= 0; i--) {
            int bit = (c >> i) & 1;
            System.out.print(bit);
        }
        System.out.print(" ");
    }

In my file im testing it on correct bit representation is:

00000100 10011011 01110011 00101001 11100110
01110010 00110111 11100100 01111101 01111111

But java code reads them like that:

00000100 11111101 01110011 00101001 11111101
01110010 00110111 11111101 01111101 01111111

I really dont know what to do. Thank you in advance!

答案1

得分: 3

你正在将文件读取为文本,但它不是文本。

有一个特殊的Unicode字符:

> U+FFFD � 代替字符,用于替换未知、无法识别或不可表示的字符。

请注意,11111101 是 FD,该值的最低 8 位。

与其使用 FileReaderBufferedReader,你应该使用 FileInputStreamBufferedInputStream。这样,你的二进制数据就不会受到代替字符和其他 Unicode 字符的破坏。

英文:

You are reading your file as if it is text, but it's not text.

There is a special unicode character:

> U+FFFD � REPLACEMENT CHARACTER used to replace an unknown, unrecognized, or unrepresentable character

Note that 11111101 is FD, the lowest 8 bits of that value.

Rather than using FileReader and BufferedReader, you should be using FileInputStream and BufferedInputStream. That way, your binary data won't be corrupted with replacement characters and other unicode things.

huangapple
  • 本文由 发表于 2023年6月5日 04:00:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76402189.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定