解压大型二进制文件

huangapple go评论71阅读模式
英文:

Decompress large binary files

问题

我有一个函数,使用以下方法来解压大型zip文件。有时候我会遇到OutOfMemoryError错误,因为文件实在太大了。是否有办法优化我的代码?我读到过关于将文件分割成适合内存的较小部分进行解压缩的内容,但我不知道如何操作。任何帮助或建议将不胜感激。

private static String decompress(String s) {
    String pathOfFile = null;

    try (BufferedReader reader = new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(s)), Charset.defaultCharset()))) {
        File file = new File(s + ".decompressed"); // Create a new file to write decompressed data
        FileOutputStream fos = new FileOutputStream(file);

        char[] buffer = new char[8192]; // Use a char buffer for more efficient reading/writing

        int bytesRead;
        while ((bytesRead = reader.read(buffer)) != -1) {
            fos.write(new String(buffer, 0, bytesRead).getBytes());
            fos.flush();
        }

        pathOfFile = file.getAbsolutePath();
    } catch (IOException e) {
        e.printStackTrace();
    }

    return pathOfFile;
}

堆栈跟踪:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
        at java.base/java.util.ArrayList.grow(ArrayList.java:237)
        at java.base/java.util.ArrayList.ensureCapacity(ArrayList.java:217)
英文:

I have a function to decompress large zip files using the below method. They are times where I run into OutOfMemoryError error because the file is just too large. Is there a way I can optimize my code? I have read something about breaking the file into smaller parts that can fit into memory and decompress but I don't know how to do that. Any help or suggestion is appreciated.

private static String decompress(String s){
        String pathOfFile = null;

        try(BufferedReader reader = new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(s)), Charset.defaultCharset()))){
            File file = new File(s);
            FileOutputStream fos = new FileOutputStream(file);

            String line;
            while((line = reader.readLine()) != null){
                fos.write(line.getBytes());
                fos.flush();
            }

            pathOfFile = file.getAbsolutePath();
        } catch (IOException e) {
            e.printStackTrace();
        }

        return pathOfFile;
    }

The stacktrace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
        at java.base/java.util.ArrayList.grow(ArrayList.java:237)
        at java.base/java.util.ArrayList.ensureCapacity(ArrayList.java:217)

答案1

得分: 2

不要使用Reader类,因为你不需要逐字符或逐行写入输出文件。你应该使用InputStream.transferTo()方法逐字节读取和写入:

try (var in = new GZIPInputStream(new FileInputStream(inFile)); 
     var out = new FileOutputStream(outFile)) { 
    in.transferTo(out); 
} 

此外,你可能不需要显式调用flush(),在每行之后这样做是浪费的。

英文:

Don't use Reader classes because you don't need to write output file character by character or line by line. You should read and write byte by byte with InputStream.transferTo() method:

try(var in = new GZIPInputStream(new FileInputStream(inFile));
    var out = new FileOutputStream(outFile)) {
    in.transferTo(out);           
}

Also you probably don't need to call flush() explicitly, doing it after every line is wasteful.

huangapple
  • 本文由 发表于 2020年5月19日 19:51:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/61890363.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定