Java – 修改并返回 BufferedInputStream。

huangapple go评论84阅读模式
英文:

java - modify and return a buffredInputStream

问题

我有一个从FileInputStream对象获得的BufferedInputStream,如下所示:

BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream);

现在,我想从bufferedInputStream中移除字符 {}(我知道文件中有这些字符)。我以为可以像使用 string replace 一样轻松地做到这一点,但我发现使用BufferedInputStream没有简单的方法来做到这一点。

有什么想法如何从BufferedInputStream中替换这些特定字符并返回新修改的BufferedInputStream吗?

编辑:
最终我想决定文件的字符集。但是字符 {} 导致我一些问题,所以我想在决定文件的字符集之前将它们删除。这是我正在尝试确定字符集的方式:

static String detectCharset(File file) {
    try (FileInputStream fileInputStream = new FileInputStream(file);
             BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream)) {
        CharsetDetector charsetDetector=new CharsetDetector();
        charsetDetector.setText(bufferedInputStream);
        charsetDetector.enableInputFilter(true);
        CharsetMatch cm=charsetDetector.detect();
        return cm.getName();
    } catch (Exception e) {
        return null;
    }
}
英文:

I have a BufferedInputStream that I got from a FileInputStream object like :

BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream)

now, I want to remove the chars { and } from the buffredInputStream (I know the file has those chars in it).
I thought that I can easily do it somehow like string replace but I saw that there is no simple way of doing it with BufferedInputStream.

any ideas how can I replace those specific chars from the BufferedInputStreamand return the new modified BufferedInputStream?

EDIT:
At the end I want to decide the charset of a file. though the chars {} are causing me some issues so I want to remove them before deciding the charset of a file. this i show I am trying to decide the charset:

static String detectCharset(File file) {
    try (FileInputStream fileInputStream = new FileInputStream(file);
             BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream)) {
        CharsetDetector charsetDetector=new CharsetDetector();
        charsetDetector.setText(bufferedInputStream);
        charsetDetector.enableInputFilter(true);
        CharsetMatch cm=charsetDetector.detect();
        return cm.getName();
    } catch (Exception e) {
        return null;
    }
}

答案1

得分: 1

NB: 对你编辑过的问题进行一下回应: 如果你不知道编码方式,实际上很难从一组字节中过滤出 "}"。所以,如果你想要过滤掉 "}",以便猜测编码方式,你就陷入了先有鸡还是先有蛋的境地。我不太理解为什么从中去除 "{" 和 "}" 会在某种程度上帮助字符集编码检测器。这听起来像是检测器存在缺陷,或者你对其操作的解释有误。如果一定要这么做,那么你可以将其理解为 "从输入流中删除字节 123 和 125",而不是 "从输入流中删除字符 { 和 }",这会更接近可行的工作定义。同样的原则适用,只不过你需要编写一个 FilterInputStream,而不是一个几乎具有相同方法的 FilterReader,只是将 123125 替换为 '{''}'

-- 原始回答 --

[1] InputStream 是字节流,而 Reader 是相同概念的字符流。没有意义说:“从输入流中过滤掉所有的 { 符号”。更合理的说法是:“从输入流中过滤掉所有的字节 '123'”。如果是 UTF-8 或 ASCII 编码,这两者是等价的,但不能保证,并且无论如何,这都不是一种良好的代码风格。要作为文本读取文件,可以这样做:

import java.nio.file.*;

Path p = Paths.get("/path/to/file");
try (BufferedReader br = Files.newBufferedReader(p)) {
    // 在这里操作读取到的字符流
}

请注意,与大多数 Java 方法不同,Files 中的方法默认使用 UTF-8 编码。你可以显式指定编码(Files.newBufferedReader(p, [在这里填写编码]))。永远不要依赖于系统默认编码是否正确;除非你知道文件使用的文本编码,否则无法将文件作为文本读取!

如果必须使用旧的 API:

try (FileInputStream fis = new FileInputStream("/path/to/file");
     InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
     BufferedReader br = new BufferedReader(isr)) {
}

请注意,你必须在这里指定字符集,否则会出现微妙的错误。

[2] 要过滤掉特定字符,你可以在读取字符的代码中进行“内联”操作(即在读取字符的代码中直接过滤),这是很简单的;或者你可以创建一个能够进行过滤的包装流。类似这样:

class RemoveBracesReader extends java.io.FilterReader {
    public RemoveBracesReader(Reader in) {
        super(in);
    }

    public int read() throws java.io.IOException {
        while (true) {
            int c = in.read();
            if (c != '{' && c != '}') return c;
        }
    }
}
英文:

NB: Adding a note to respond to the edit you have done to your question: You can't really filter } from a bag of bytes unless you know the encoding, so if you want to filter } out in order to guess at encoding you're in a chicken-and-egg situation. I do not understand how removing { and } would somehow help a charset encoding detector, though. That sounds like the detector is buggy or you're misinterpreting what it is doing. If you must, rewrite your brain to treat this as 'removing byte 123 and 125 from an inputstream' instead of 'remove chars { and } from an inputstream' and you're closer to a workable job definition. The same principle applies, except you'd write a FilterInputStream instead of a FilterReader with almost the same methods, except 123 and 125 instead of '{' and '}'.

-- original answer --

[1] InputStream refers to bytes, Reader is the same concept, except, for characters. It does not make sense to say: "filter all { from an inputstream". It would make sense to say "filter all occurrences of byte '123' from an inputstream". If it's UTF-8 or ASCII, these two are equivalent, but there's no guarantee, and it's not 'nice' code in any fashion. To read files as text, this is how:

import java.nio.file.*;

Path p = Paths.get("/path/to/file");
try (BufferedReader br = Files.newBufferedReader(p)) {
    // operate on the reader here
}

note that unlike most java methods, the methods in Files assume UTF_8. You can specify the encoding explicitly (Files.newBufferedReader(p, [ENCODING HERE])) instead. You should never rely on the system default encoding being the right one; you cannot read a file as text unless you know in what text encoding it is written!

If you must use old API:

try (FileInputStream fis = new FileInputStream("/path/to/file");
     InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
     BufferedReader br = new BufferedReader(isr)) {
}

note that you MUST specify charset here or things break is subtle ways.

[2] to filter out certain characters, you can either do it 'inline' (in the code that reads chars from the reader), which is trivial, or you can create a wrapper stream that can do it. Something like:

class RemoveBracesReader extends java.io.FilterReader {
    public RemoveBracesReader(Reader in) {
        super(in);
    }

    public int read() throws java.io.IOException {
        while (true) {
            int c = in.read();
            if (c != '{' && c != '}') return c;
        }
    }
}

huangapple
  • 本文由 发表于 2020年5月19日 20:30:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/61891114.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定