如何解决ZipInputStream的编码问题?

huangapple go评论66阅读模式
英文:

How do I resolve this encoding problem with ZipInputStream?

问题

我正在对一个UTF-8编码的zip文件进行ZipInputStream请求。

我成功获取了数据,但特殊的德语字符显示错误。

使用这个页面(http://kellykjones.tripod.com/webtools/ascii_utf8_table.html)我可以看到我的代码从UTF8编码列中打印出了两个单独的字符。

例如,ä 是UTF 0xC3,0xA4,而我得到的是 ä(它们分别是0xC3和0xA4字符)。有人有什么建议吗?

private InputStream downloadCsv(final String countryCode) {
    final String url = baseUrl + countryCode.toUpperCase() + ".zip";
    final String fileName = countryCode.toUpperCase() + ".txt";

    BufferedInputStream in = null;
    ZipInputStream zIn = null;

    try {
        in = new BufferedInputStream(new URL(url).openStream());
        zIn = new ZipInputStream(in, Charset.forName("UTF-8"));
        
        ZipEntry zipEntry;
        
        while ((zipEntry = zIn.getNextEntry()) != null) {
            if (zipEntry.getName().equals(fileName)) {
                StringBuilder sb = new StringBuilder();
                
                int c;
                while((c = zIn.read()) != -1) {
                    sb.append((char)c);
                    System.out.println((char)c + " : " + c);
                }

                return new ByteArrayInputStream(sb.toString().getBytes());
            }
        }
        // 更多代码
    }
}
英文:

I'm doing a ZipInputStream request on a UTF-8 encoded zip file.

I get the data through OK, but special German characters are coming out wrong.

Using this page ( http://kellykjones.tripod.com/webtools/ascii_utf8_table.html ) I can see that my code is printing out the two individual chars from the UTF8 encoding column.

i.e. ä is UTF 0xC3,0xA4, and I am getting ä printed out (which are the 0xC3 and 0xA4 chars). Does anyone have any tips?

    private InputStream downloadCsv(final String countryCode) {
        final String url = baseUrl + countryCode.toUpperCase() + ".zip";
		final String fileName = countryCode.toUpperCase() + ".txt";

		BufferedInputStream in = null;
		ZipInputStream zIn = null;

		try {
			in = new BufferedInputStream(new URL(url).openStream());
			zIn = new ZipInputStream(in, Charset.forName("UTF-8"));
			
			ZipEntry zipEntry;
			
			while ((zipEntry = zIn.getNextEntry()) != null) {
				if (zipEntry.getName().equals(fileName)) {
					StringBuilder sb = new StringBuilder();
					
					int c;
					while((c = zIn.read()) != -1) {
						sb.append((char)c);
						System.out.println((char)c + " : " + c);
					}

					return new ByteArrayInputStream(sb.toString().getBytes());
				}
			}
...
more code
...

答案1

得分: 1

对于记录,我使用@saka1029s的建议进行了修复,使用了InputStreamReader,如果我可以的话,我会将其标记为被接受的答案!

我不能保证我的代码是最干净的,但现在它可以工作了:

英文:

For the record, I fixed this using @saka1029s advice, using an InputStreamReader, and would mark it as the accepted answer if I could!

I can't promise my code is the cleanest, but it works now:

		BufferedInputStream in = null;
		ZipInputStream zIn = null;
		InputStreamReader zInReader = null;

		try {
			in = new BufferedInputStream(new URL(url).openStream());
			zIn = new ZipInputStream(in);
			
			ZipEntry zipEntry;
			
			while ((zipEntry = zIn.getNextEntry()) != null) {
				if (zipEntry.getName().equals(fileName)) {
					StringBuilder sb = new StringBuilder();
					zInReader = new InputStreamReader(zIn);

					int c;
					while((c = zInReader.read()) != -1) {
						sb.append((char)c);
					}

					return new ByteArrayInputStream(sb.toString().getBytes());
				}
			}

huangapple
  • 本文由 发表于 2020年8月8日 17:22:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/63313749.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定