泰文脚本似乎在Java的for-each循环中丢失了UTF-8编码。

huangapple go评论77阅读模式
英文:

Thai script seems to lose UTF-8 encoding in java for-each loop

问题

我正在尝试在Windows 10上的Android Studio中开发应用程序

问题以下泰语单词字符串数组
```java
String[] myTHarr = { "มาก", "เชี่ยว", "แน่", "ม่อน", "บ้าน", "พูด", "เลื่อย", "เมื่อ", "ช่ำ", "แร่" };

...在以下for-each循环处理时:

for (String s : myTHarr) {
  // 在执行下面的任何代码之前,s = มาà¸�:
  byte[] utf8EncodedThaiArr = s.getBytes("UTF-8");
  String utf8EncodedThai = new String(utf8EncodedThaiArr); // 在这里设置断点
  // s 仍然是 มาà¸�     (我希望它是 มาก)
  // 进行操作
}

当尝试处理第一个单词时(其他单词也都不起作用,但这是预期的),结果为 s = มาà¸�。

泰语文字在字符串数组中显示正确(声明直接从Android Studio复制),文件编码设置为java文件的UTF-8(根据这里),文件编码设置如下(根据这里):

泰文脚本似乎在Java的for-each循环中丢失了UTF-8编码。


<details>
<summary>英文:</summary>

I&#39;m trying to develop an application within Android Studio on Windows 10.

PROBLEM: The following string array of Thai words:

String[] myTHarr = {"มาก","เชี่ยว","แน่","ม่อน","บ้าน","พูด","เลื่อย","เมื่อ","ช่ำ","แร่"};


...when processed by the following for-each loop:

for (String s:myTHarr){
//s = มา� before executing any of the below code:
byte[] utf8EncodedThaiArr = s.getBytes("UTF-8");
String utf8EncodedThai = new String(utf8EncodedThaiArr); //setting breakpoint here
// s is still มาà¸� (I want it to be มาก)
//do stuff
}


results in s = &#224;&#184;&#161;&#224;&#184;&#178;&#224;&#184;� when attempting to process the first word (none of the other words work either, but that&#39;s expected given the first fails).

The Thai script appears in the string array correctly (the declaration was copied straight from Android Studio), the file encoding is set to UTF-8 for the java file (per [here][1]), and the File Encoding Settings look like this (per [here][2]):

[![enter image description here][3]][3]



  [1]: https://stackoverflow.com/questions/30082741/change-the-encoding-of-a-file-in-visual-studio-code
  [2]: https://stackoverflow.com/questions/30184062/android-studio-project-encoding-default-value/30340777
  [3]: https://i.stack.imgur.com/229yy.png

</details>


# 答案1
**得分**: 2

根据文档,`String(byte[])` 构造函数 "通过使用平台的默认字符集对指定的字节数组进行解码,构造一个新的字符串。"

我猜测默认的字符集不是 UTF-8。因此解决方案是为字节数组指定编码方式。

```java
String utf8EncodedThai = new String(utf8EncodedThaiArr, "UTF-8"); //在此设置断点
英文:

According to the documentation, String(byte[]) constructor "Constructs a new String by decoding the specified array of bytes using the platform's default charset."

I'm guessing that the default character set is not UTF-8. So the solution is to specify the encoding for the array of bytes.

String utf8EncodedThai = new String(utf8EncodedThaiArr, &quot;UTF-8&quot;); //setting breakpoint here

答案2

得分: 0

正如评论中有几位指出的那样,问题必须出在我的环境中。经过进一步搜索,我发现在更改了编码后应该重新构建项目(所以仅仅切换到UTF8并点击'Apply'/'OK'是不够的)。值得注意的是,我的文件编码设置如下,仅供参考:
泰文脚本似乎在Java的for-each循环中丢失了UTF-8编码。

重新构建后,我开始收到编译器错误“unmappable character for encoding cp1252”,该错误出现在包含泰语字符的字符串数组中(附注:其中一些泰语字符没问题,而其他字符则显示为� 和其他类似的字符。我本以为要么所有的泰语字符都能正常工作,要么所有字符都不能工作,但令我惊讶的是,甚至包括像 ก 这样常见的泰语字母也会导致编译器出错)。

这个错误引发了这个帖子,在帖子中我尝试了一些方法来将编译器选项设置为UTF8。由于我的应用程序恰好是安卓应用程序的一种“预处理”,因此与应用程序本身分开(如果有意义的话),所以我没有像前面提到的stackoverflow帖子中建议的那样使用compilerOptions属性的便利(尽管后来我已经将其添加到了安卓应用程序方面的gradle中)。这使我设置了通过PowerShell设置环境变量JAVA_TOOLS_OPTIONS的命令:

setx JAVA_TOOLS_OPTIONS "-Dfile.encoding=UTF8"

这解决了问题!

英文:

As several in the comments pointed out the problem had to be within my environment. After a bit more searching I found I should have rebuilt the project after changing the encodings (so merely switching to UTF8 and clicking 'Apply'/'OK' wasn't enough). I should note here that my File Encoding settings look like this, for reference:
泰文脚本似乎在Java的for-each循环中丢失了UTF-8编码。

Once I rebuilt, I started getting the compiler error "unmappable character for encoding cp1252" on the String array containing the Thai (side note: Some of the Thai characters were fine, others rendered as � and friends. I would have thought either all of the Thai would work or none of it, but was surprised to see even common Thai letters such as ก cause the compiler to choke).

That error led to this post in which I tried a few things to set the compiler options to UTF8. Since my application happens to be a sort of 'pre-process' for an android app, and is therefore separate from the app itself (if that makes any sense), I didn't have the luxury of using the compilerOptions attribute as the answers in the aforementioned SO post recommended (though I have since added it to the gradle on the android app side). This led me to setting the environment variable JAVA_TOOLS_OPTIONS via powershell:

setx JAVA_TOOLS_OPTIONS &quot;-Dfile.encoding=UTF8&quot;

Which fixed the issue!

答案3

得分: -1

我尝试了附带的设置并运行了您的代码,代码正常工作。泰文脚本似乎在Java的for-each循环中丢失了UTF-8编码。

英文:

I tried your code with the attached settings, and the code worked fine.泰文脚本似乎在Java的for-each循环中丢失了UTF-8编码。

huangapple
  • 本文由 发表于 2020年8月25日 22:09:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/63580725.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定