2023年6月5日 04:00:39go评论73阅读模式

英文:

My program reads my file, coded using huffman encoding wrong! Most of the bytes end up being '11111101', even though they're not. Why is that?

问题

I'm currently working on my Huffman decoder in Java. I've been stuck on wrong decoding for a few days now, and only yesterday I realized it's because my program reads input wrong! I've downloaded outside software to read files bit by bit, and it ends up being completely different from what they should be. Please, could you help me resolve this issue?

Here's my code in Java that I use to test what input bits I read:

public static void main(String[] args) {
    ...
    try {
        FileReader plikin = new FileReader(property);
        BufferedReader pinh = new BufferedReader(plikin);
        FileWriter plikout = new FileWriter("out.txt");
        BufferedWriter pout = new BufferedWriter(plikout);
        printRemainingBits(pinh);
        pout.close();
    }
    ...
}

static void printRemainingBits(BufferedReader pinh) throws IOException {
    System.out.println("\nRemaining Bits:");
    int c;
    while ((c = pinh.read()) != -1) {
        printCharBits((char) c);
    }
}

public static void printCharBits(char c) {
    for (int i = 7; i >= 0; i--) {
        int bit = (c >> i) & 1;
        System.out.print(bit);
    }
    System.out.print(" ");
}

In my file, I'm testing it on the correct bit representation is:

00000100 10011011 01110011 00101001 11100110
01110010 00110111 11100100 01111101 01111111

But Java code reads them like that:

00000100 11111101 01110011 00101001 11111101
01110010 00110111 11111101 01111101 01111111

I really don't know what to do. Thank you in advance!

英文:

Im currently working on my huffman decoder in java. I've been stuck on wrong decoding for few days now, and only yesterday I realised it's bacause my program reads input wrong! I've downloaded outside software to read files bit by bit, and it end up being completly different, from what they should be. Please could you help me resolve this issue?

Heres my code in java, i use to test what input bites I read:

public static void main(String[] args) {
        ...
        try {
            FileReader plikin = new FileReader(property);
            BufferedReader pinh = new BufferedReader(plikin);
            FileWriter plikout = new FileWriter(&quot;out.txt&quot;);
            BufferedWriter pout = new BufferedWriter(plikout);
            printRemainingBits(pinh);
            pout.close();
        }
        ...
    }

static void printRemainingBits(BufferedReader pinh) throws IOException {
        System.out.println(&quot;\nRemaining Bits:&quot;);
        int c;
        while ((c = pinh.read()) != -1) {
            printCharBits((char) c);
        }
    }

public static void printCharBits(char c) {
        for (int i = 7; i &gt;= 0; i--) {
            int bit = (c &gt;&gt; i) &amp; 1;
            System.out.print(bit);
        }
        System.out.print(&quot; &quot;);
    }

In my file im testing it on correct bit representation is:

00000100 10011011 01110011 00101001 11100110
01110010 00110111 11100100 01111101 01111111

But java code reads them like that:

00000100 11111101 01110011 00101001 11111101
01110010 00110111 11111101 01111101 01111111

I really dont know what to do. Thank you in advance!

答案1

得分: 3

你正在将文件读取为文本，但它不是文本。

有一个特殊的Unicode字符：

> U+FFFD � 代替字符，用于替换未知、无法识别或不可表示的字符。

请注意，11111101 是 FD，该值的最低 8 位。

与其使用 FileReader 和 BufferedReader，你应该使用 FileInputStream 和 BufferedInputStream。这样，你的二进制数据就不会受到代替字符和其他 Unicode 字符的破坏。

英文:

You are reading your file as if it is text, but it's not text.

There is a special unicode character:

> U+FFFD � REPLACEMENT CHARACTER used to replace an unknown, unrecognized, or unrepresentable character

Note that 11111101 is FD, the lowest 8 bits of that value.

Rather than using FileReader and BufferedReader, you should be using FileInputStream and BufferedInputStream. That way, your binary data won't be corrupted with replacement characters and other unicode things.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

My program reads my file, coded using huffman encoding wrong! Most of the bytes end up being '11111101', even though they're not. Why is that?

问题

答案1

多个入口点（主类）位于JAR包顶层。

禁用自定义健康检查端点

在IntelliJ IDEA中进行调试时，如何防止变量在线程暂停时受到影响？

从网站解析 JSON 在 Java 中（Android）。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论