问题

我有一个以ISO-8859-1编码的文件。我试图将其读取为一个单独的字符串，对其进行一些正则表达式替换操作，然后以相同的编码写回文件。

然而，我得到的结果文件似乎总是UTF-8编码（至少根据Notepad++的显示），损坏了一些字符。

有人能看出我在这里做错了什么吗？

英文:

I have file encoded in ISO-8859-1. I'm trying to read it in as a single String, do some regex substitutions on it, and write it back out in the same encoding.

However, the resulting file I get always seems to be UTF-8 (according to Notepad++ at least), mangling some characters.

Can anyone see what I'm doing wrong here?

private static void editFile(File source, File target) {

	// Source and target encoding
	Charset iso88591charset = Charset.forName(&quot;ISO-8859-1&quot;);

	// Read the file as a single string
	String fileContent = null;

	try (Scanner scanner = new Scanner(source, iso88591charset)) {
	
	    fileContent = scanner.useDelimiter(&quot;\\Z&quot;).next();
	    		
	} catch (IOException exception) {
	    LOGGER.error(&quot;Could not read input file as a single String.&quot;, exception);
	    return;
	}

	// Do some regex substitutions on the fileContent string
	String newContent = regex(fileContent);

	// Write the file back out in target encoding
	try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), iso88591charset))) {
	
		writer.write(newContent);
		
	} catch (Exception exception) {
		LOGGER.error(&quot;Could not write out edited file!&quot;, exception);
	}
}

答案1

得分: 2

以下是翻译好的内容：

你的代码实际上没有问题。Notepad++看到文件以UTF-8编码，因为从基本层面上讲，UTF-8和你尝试使用的编码之间没有区别。只有特定的字符不同，而且一些（很多）字符在ISO编码中缺失，与UTF相比。你可以在这里阅读更多信息，或者只需在Google中搜索“ISO-8859-1 vs UTF-8”。

我已经用你的代码创建了一个简单的项目，并且使用了对ISO编码不同的字符进行了测试 - 结果是IntelliJ（可能也适用于Notepad++ - 无法轻易检查，因为我在Linux上）将其识别为ISO-8859-1编码的文件。除此之外，我还添加了另一个使用了Files类的新功能（JDK11）。你所使用的new Scanner(source, charset)是在JDK10中添加的，所以我认为你可能已经在使用11了。下面是简化后的代码：

private static void editFile(File source, File target) {
    Charset charset = StandardCharsets.ISO_8859_1;
    String fileContent;
    try {
        fileContent = Files.readString(source.toPath(), charset);
    } catch (IOException exception) {
        System.err.println("无法将输入文件读取为单个字符串。");
        exception.printStackTrace();
        return;
    }
    String newContent = regex(fileContent);
    try {
        Files.writeString(target.toPath(), newContent, charset);
    } catch (IOException exception) {
        System.err.println("无法写入编辑后的文件！");
        exception.printStackTrace();
    }
}

请随意克隆存储库或在GitHub上查看它，并使用您喜欢的代码版本。

英文:

There is nothing actually wrong with your code. Notepad++ sees the file encoded in UTF-8 because on a basic level there is no difference between UTF-8 and the encoding you're trying to use. Only specific characters differ and some (a lot) are missing from ISO compared to UTF. You can read more here or by simply searching ISO-8859-1 vs UTF-8 in Google.

I've created a simple project with your code and tested it with characters that are different for the ISO encoding - the result is a file that IntelliJ (and probably Notepad++ as well - cannot easily check, I'm on Linux) recognizes as ISO-8859-1. Apart from that, I've added another class that makes use of new (JDK11) features from Files class. The new Scanner(source, charset) that you've used was added in JDK10, so I think that you may be using 11 already. Here's the simplified code:

private static void editFile(File source, File target) {
    Charset charset = StandardCharsets.ISO_8859_1;
    String fileContent;
    try {
        fileContent = Files.readString(source.toPath(), charset);
    } catch (IOException exception) {
        System.err.println(&quot;Could not read input file as a single String.&quot;);
        exception.printStackTrace();
        return;
    }
    String newContent = regex(fileContent);
    try {
        Files.writeString(target.toPath(), newContent, charset);
    } catch (IOException exception) {
        System.err.println(&quot;Could not write out edited file!&quot;);
        exception.printStackTrace();
    }
}

Feel free to clone the repository or check it on GitHub and use whichever code version you prefer.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Reading and writing file in ISO-8859-1 encoding?

问题

答案1

如何创建一个返回元组或同时更新两列的Spark UDF？

TTS: 语音合成失败

如何将我的Quarkus项目连接到SQLITE数据库？

如何将这个迭代语句转换为递归语句？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论