问题

我尝试了几种方法来完成这个任务，但很不幸，它们都没有奏效。
主要问题是获取所有 txt 文件的路径，我用以下代码来实现：

```java
public static List<String> getFileList() throws IOException {
    int depth = 5;
    String dir = "pathToMainFolder";
    Stream<Path> stream = Files.walk(Paths.get(dir), depth);
    List<String> paths = new ArrayList<>();
    try {
        stream.filter(file -> !Files.isDirectory(file))
                .map(Path::toString)
                .filter(file -> file.endsWith("txt"))
                .collect(Collectors.toCollection(() -> paths));
    } catch (Exception e) {
        e.printStackTrace();
    }
    return paths;
}

之后，我尝试了以下代码来更改文件的解码方式，参考自：https://stackoverflow.com/questions/18141162/how-to-convert-ansi-to-utf8-in-java：

for (String path : paths) {
    Path p = Paths.get(path);
    ByteBuffer bb = ByteBuffer.wrap(Files.readAllBytes(p));
    CharBuffer cb = Charset.forName("Cp1252").decode(bb);
    bb = Charset.forName("UTF-8").encode(cb);
    Files.write(p, bb.array());
}

这将文件转换为 UTF-8 编码，但文件内容与我的预期相差很远。例如：预期应该是 tań，但实际是 tañ；应该是 choć，但实际是 choæ。

我还尝试了使用 BufferedReader 和 BufferedWriter 创建新文件，尝试在解码更改后替换所有符号。唯一起作用的是 Normalizer：

String everything = "";
BufferedReader br = new BufferedReader(new FileReader(path));
try {
    StringBuilder sb = new StringBuilder();
    String line = Normalizer.normalize(br.readLine(), Normalizer.Form.NFD)
            .replaceAll("[^\\p{ASCII}]", "");

    while (line != null) {
        System.out.println(line);
        sb.append(line);
        sb.append(System.lineSeparator());
        line = Normalizer.normalize(br.readLine(), Normalizer.Form.NFD)
                .replaceAll("[^\\p{ASCII}]", "");
    }
    everything = sb.toString();
    System.out.println(everything);
} finally {
    br.close();
}

但这是我将在没有其他解决方案的情况下尝试的最后一招。我还要提到，在文件夹和子文件夹中有 14k+ 个文件需要更改（文件不是很长，平均每个文件 487 行，每行只有少量字符）。

对于这个问题，是否有任何方法或解决方案呢？


<details>
<summary>英文:</summary>

I`ve tried few approaches to complete this task, unfortunately none of them is working.
Main thing is get all txt files path, so i do it with this piece of code:

public static List&lt;String&gt; getFileList() throws IOException {
    int depth = 5;
    String dir = &quot;pathToMainFolder&quot;;
    Stream&lt;Path&gt; stream = Files.walk(Paths.get(dir), depth);
    List&lt;String&gt; paths = new ArrayList&lt;&gt;();
    try {
        stream.filter(file -&gt; !Files.isDirectory(file))
                .map(Path::toString)
                .filter(file -&gt; file.endsWith(&quot;txt&quot;))
                .collect(Collectors.toCollection(() -&gt; paths));
    } catch (Exception e) {
        e.printStackTrace();
    }
    return paths;
}

After that i tried changing decoding of file with this piece of code from 
&lt;https://stackoverflow.com/questions/18141162/how-to-convert-ansi-to-utf8-in-java&gt;:

   for (String path : paths) {
        Path p = Paths.get(path);
        ByteBuffer bb = ByteBuffer.wrap(Files.readAllBytes(p));
        CharBuffer cb = Charset.forName(&quot;Cp1252&quot;).decode(bb);
        bb = Charset.forName(&quot;UTF-8&quot;).encode(cb);
        Files.write(p, bb.array());

}

It changed files to UTF-8 coding, but file content is far different from my expectations. For example: is should be: `tań` but is is `ta&#241;`, should be `choć` but it is `cho&#230;`.
I tried also creating new files with `BufferedReader` and `BufferedWriter`, i was trying replacing all signs after decode change. Only thing that works is `Normalizer`:

        String everything = &quot;&quot;;
        BufferedReader br = new BufferedReader(new FileReader(path));
        try {

            StringBuilder sb = new StringBuilder();
            String line = Normalizer.normalize(br.readLine(), Normalizer.Form.NFD)
                    .replaceAll(&quot;[^\\p{ASCII}]&quot;, &quot;&quot;);

            while (line != null) {
                System.out.println(line);
                sb.append(line);
                sb.append(System.lineSeparator());
                line = Normalizer.normalize(br.readLine(), Normalizer.Form.NFD)
                        .replaceAll(&quot;[^\\p{ASCII}]&quot;, &quot;&quot;);
            }
            everything = sb.toString();
            System.out.println(everything);
        } finally {
            br.close();
        }

But it is the last thing i will do after there is no solution. I will also mention that there are 14k+ files to change in folder and subfolders (files are not so long avg 487 lines with few chars in each).
Any approach or solution for this problem?

</details>


# 答案1
**得分**: 1

以下是您提供的代码片段的翻译部分：

转换文件的字符集之间无需使用字符/字节缓冲区，以下是使用字符串和 getBytes 进行重新编码的简单调用示例：

private static void recode(Path input, Charset inCharset, Path output, Charset outCharset)
{
try
{
Files.createDirectories(output.getParent());
Files.write(output, new String(Files.readAllBytes(input), inCharset).getBytes(outCharset));
}
catch (IOException e)
{
throw new UncheckedIOException(e);
}
}

您需要注意仅在目录上运行一次，对于测试，最好在单独的输入/输出目录中进行构建。您的 main 方法可以使用 Files.find 进行简化，并直接处理转换：

public static void main(String[] args) throws IOException
{
Charset outCharset = StandardCharsets.UTF_8;

// 根据需要更改
Charset inCharset = Charset.forName("Cp1252");
// 或者
// Charset inCharset  = StandardCharsets.XYZ;
// 或者
// Charset inCharset = Charset.forName(System.getProperty("file.encoding"));

int depth = 5;
Path dir = Path.of("主文件夹路径");
Path outdir = Path.of("主文件夹路径.utf8");
try (Stream<Path> stream = Files.find(dir, depth, (p,a) -> a.isRegularFile() && p.getFileName().toString().endsWith(".txt")))
{
    stream.forEach(p -> recode(p, inCharset, outdir.resolve(dir.relativize(p)), outCharset));
}

}


<details>
<summary>英文:</summary>

Converting file between two character sets does not need to use Char/Byte Buffer, here is simple call to recode using String and getBytes:

    private static void recode(Path input, Charset inCharset, Path output, Charset outCharset)
    {
        try
        {
            Files.createDirectories(output.getParent());
            Files.write(output, new String(Files.readAllBytes(input), inCharset).getBytes(outCharset));
        }
        catch (IOException e)
        {
            throw new UncheckedIOException(e);
        }
    }

You&#39;ll need to take care to only run once on your directories, for tests it would be safer to build separate IN/OUT dirs. Your main can be simplified with `Files.find` and process the transformation directly:

    public static void main(String[] args) throws IOException
    {
        Charset outCharset = StandardCharsets.UTF_8;

        // Change as required
        Charset inCharset = Charset.forName(&quot;Cp1252&quot;);
        // OR
        // Charset inCharset  = StandardCharsets.XYZ;
        // OR
        // Charset inCharset = Charset.forName(System.getProperty(&quot;file.encoding&quot;));

        int depth = 5;
        Path dir = Path.of(&quot;pathToMainFolder&quot;);
        Path outdir = Path.of(&quot;pathToMainFolder.utf8&quot;);
        try (Stream&lt;Path&gt; stream = Files.find(dir, depth, (p,a) -&gt; a.isRegularFile() &amp;&amp; p.getFileName().toString().endsWith(&quot;.txt&quot;)))
        {
            stream.forEach(p -&gt; recode(p, inCharset, outdir.resolve(dir.relativize(p)), outCharset));
        }
    }


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将文件夹及其子文件夹中的所有txt文件从ANSI解码为UTF-8。

问题

java.lang.NoSuchMethodError: No such DSL method 'ci' found among steps when using jenkins shared library

获取Minecraft路径使用Fabric

如何解决Swing计算器GUI的十进制错误？

这是 XML 内的前三个字符。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论