2020年9月2日 21:19:58go评论104阅读模式

英文:

When I create a file with java 8, using the Shift-JIS charset, some chars are substitute with char '?'

问题

我在使用Shift-JIS字符集创建文件时遇到了问题。

以下是我想要写入txt文件的文本示例：

>繰戻_日経選挙システム保守2019年1月10日～;[2019年度更新]横浜第１DCコロケ―ション（２ラック）

使用Shift-JIS字符集时，在文件中我发现两个 '?'，而不是～和 ―：

>繰戻_日経選挙システム保守2019年1月10日?;[2019年度更新]横浜第１DCコロケ?ション（２ラック）

使用UTF-8字符集时，在文件中显示正常：

>繰戻_日経選挙システム保守2019年1月10日～;[2019年度更新]横浜第１DCコロケ―ション（２ラック）

这是我的代码：

package it.grupposervizi.easy.ef.etl.elaboration;

import com.nimbusds.jose.util.StandardCharset;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.List;
import org.apache.commons.io.FileUtils;

public class TestShiftJIS {

  private static final String TEXT = "繰戻_日経選挙システム保守2019年1月10日～;[2019年度更新]横浜第１DCコロケ―ション（２ラック）";
  private static final String DIRECTORY = "C:\\temp\\japan\\";
  private static final String SHIFT_JIS = "Shift-JIS";
  private static final String UTF_8 = StandardCharset.UTF_8.name();
  private static final String EXTENSION = ".txt";

  public static void main(String[] args) {

    final List<String> charsets = Arrays.asList(SHIFT_JIS, UTF_8);
    charsets.forEach(c -> {
      final String fName = DIRECTORY + c + EXTENSION;
      File file = new File(fName);
      try {
        FileUtils.writeStringToFile(file, TEXT, Charset.forName(c));
      } catch (IOException e) {
        throw new RuntimeException(e);
      }
    });

    System.out.println("End Test");
  }
}

你有没有想法为什么这两个字符没有包含在Shift-JIS字符集中呢？

英文:

I have a problem when I create a file using the Shift-JIS charset.

This is an example of text that I want write into a txt file:

>繰戻_日経選挙システム保守2019年1月10日～;[2019年度更新]横浜第１DCコロケ―ション（２ラック）

Using Shift-JIS charset, into the file I find two '?' instead of ～ and ―:

>繰戻_日経選挙システム保守2019年1月10日?;[2019年度更新]横浜第１DCコロケ?ション（２ラック）

Using UTF-8 charset, into the file I find (all correct):

>繰戻_日経選挙システム保守2019年1月10日～;[2019年度更新]横浜第１DCコロケ―ション（２ラック）

This is my code:

package it.grupposervizi.easy.ef.etl.elaboration;

import com.nimbusds.jose.util.StandardCharset;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.List;
import org.apache.commons.io.FileUtils;

public class TestShiftJIS {

  private static final String TEXT = &quot;繰戻_日経選挙システム保守2019年1月10日～;[2019年度更新]横浜第１DCコロケ―ション（２ラック）&quot;;
  private static final String DIRECTORY = &quot;C:\\temp\\japan\\&quot;;
  private static final String SHIFT_JIS = &quot;Shift-JIS&quot;;
  private static final String UTF_8 = StandardCharset.UTF_8.name();
  private static final String EXTENSION = &quot;.txt&quot;;

  public static void main(String[] args) {

    final List&lt;String&gt; charsets = Arrays.asList(SHIFT_JIS, UTF_8);
    charsets.forEach(c -&gt; {
      final String fName = DIRECTORY + c + EXTENSION;
      File file = new File(fName);
      try {
        FileUtils.writeStringToFile(file, TEXT, Charset.forName(c));
      } catch (IOException e) {
        throw new RuntimeException(e);
      }
    });

    System.out.println(&quot;End Test&quot;);
  }
}

Do you have any idea why these two chars are not included into the Shift-JIS charset?

答案1

得分: 1



尝试保存具有不同于默认编码的罕见编码的文件。尝试更改字符的编码。
有关编码的更多信息 » https://en.wikipedia.org/wiki/Character_encoding

尝试使用：`Charset.forName("CP943C")`

英文:

///EDIT:

You try to save file that has uncommon (different from default) encoding. Try to change encoding of chars.
more about encoding » https://en.wikipedia.org/wiki/Character_encoding

///

Try using: Charset.forName("CP943C")

答案2

得分: 0

@JosefZ基本上已经给出了答案：Shift-JIS不支持～（U+FF5E）和―（U+FF5E）。

这可以通过使用Charset.newEncoder().canEncode(char)来验证：

public class ShiftJisTest {
    public static void main(String[] args) {
        // Some Japanese text containing special characters
        String s = "\u7e70\u623b\u005f\u65e5\u7d4c\u9078\u6319\u30b7\u30b9\u30c6\u30e0\u4fdd\u5b88\u0032\u0030\u0031\u0039\u5e74\u0031\u6708\u0031\u0030\u65e5\uff5e\u003b\u005b\u0032\u0030\u0031\u0039\u5e74\u5ea6\u66f4\u65b0\u005d\u6a2a\u6d5c\u7b2c\uff11\u0044\u0043\u30b3\u30ed\u30b1\u2015\u30b7\u30e7\u30f3\uff08\uff12\u30e9\u30c3\u30af\uff09";
        Charset charset = Charset.forName("Shift-JIS");
        for (char c : s.toCharArray()) {
            CharsetEncoder encoder = charset.newEncoder();
            if (!encoder.canEncode(c)) {
                System.out.printf("%s (U+%04X)%n", c, (int) c);
            }
        }
        
        try {
            charset.newEncoder().encode(CharBuffer.wrap(s));
        } catch (CharacterCodingException e) {
            // java.nio.charset.UnmappableCharacterException: Input length = 1
            e.printStackTrace();
        }
    }
}

你看到?的原因是因为Apache Commons IO的FileUtils.writeStringToFile(File, String, Charset)在内部使用了String.getBytes(Charset)，其文档说：
> [...] This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

而CharsetEncoder的文档说：
> [...] The replacement is initially set to the encoder's default replacement, which often (but not always) has the initial value { (byte)'?' }

英文:

@JosefZ has basically already given the answer: Shift-JIS does not support ～ (U+FF5E) and ― (U+FF5E).

This can be verified using Charset.newEncoder().canEncode(char):

public class ShiftJisTest {
    public static void main(String[] args) {
        // 繰戻_日経選挙システム保守2019年1月10日～;[2019年度更新]横浜第１DCコロケ―ション（２ラック）
        String s = &quot;\u7e70\u623b\u005f\u65e5\u7d4c\u9078\u6319\u30b7\u30b9\u30c6\u30e0\u4fdd\u5b88\u0032\u0030\u0031\u0039\u5e74\u0031\u6708\u0031\u0030\u65e5\uff5e\u003b\u005b\u0032\u0030\u0031\u0039\u5e74\u5ea6\u66f4\u65b0\u005d\u6a2a\u6d5c\u7b2c\uff11\u0044\u0043\u30b3\u30ed\u30b1\u2015\u30b7\u30e7\u30f3\uff08\uff12\u30e9\u30c3\u30af\uff09&quot;;
        Charset charset = Charset.forName(&quot;Shift-JIS&quot;);
        for (char c : s.toCharArray()) {
            CharsetEncoder encoder = charset.newEncoder();
            if (!encoder.canEncode(c)) {
                System.out.printf(&quot;%s (U+%04X)%n&quot;, c, (int) c);
            }
        }
        
        try {
            charset.newEncoder().encode(CharBuffer.wrap(s));
        } catch (CharacterCodingException e) {
            // java.nio.charset.UnmappableCharacterException: Input length = 1
            e.printStackTrace();
        }
    }
}

The reason why you are seeing ? is because Apache Commons IO's FileUtils.writeStringToFile(File, String, Charset) internally (1, 2) uses String.getBytes(Charset) whose documentation says:
> [...] This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

And the CharsetEncoder documentation says:
> [...] The replacement is initially set to the encoder's default replacement, which often (but not always) has the initial value { (byte)'?' }

答案3

得分: 0

根据@Marcono1234的回答，在Java中的Shift-JIS映射不支持～（U+FF5E）和―（U+FF5E）。要将这些代码点从UTF-8映射到Shift-JIS编码，您需要使用Charset.forName("windows-31j");或者Charset.forName("MS932");，而不是使用Charset.forName("Shift-JIS");。

英文:

As @Marcono1234 answered, the Shift-JIS mapping in Java does not support ～ (U+FF5E) and ― (U+FF5E). To map these codepoints from UTF-8 into Shift-JIS encoding, you have to use Charset.forName("windows-31j"); or Charset.forName("MS932"); rather than Charset.forName("Shift-JIS");.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

When I create a file with java 8, using the Shift-JIS charset, some chars are substitute with char '?'

问题

答案1

答案2

答案3

Interface Workbook、Sheet 和 Row 与类 XSSFWorkbook、XSSFSheet 之间的混淆。

最佳实践是什么，用于本地化和多语言支持？

为什么我不能使用类属性在Flipkart中定位元素？

使用流进行多属性分组和排序：Java 8

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论