英文:
Thai script seems to lose UTF-8 encoding in java for-each loop
问题
我正在尝试在Windows 10上的Android Studio中开发应用程序。
问题:以下泰语单词字符串数组:
```java
String[] myTHarr = { "มาก", "เชี่ยว", "แน่", "ม่อน", "บ้าน", "พูด", "เลื่อย", "เมื่อ", "ช่ำ", "แร่" };
...在以下for-each循环处理时:
for (String s : myTHarr) {
// 在执行下面的任何代码之前,s = มาà¸�:
byte[] utf8EncodedThaiArr = s.getBytes("UTF-8");
String utf8EncodedThai = new String(utf8EncodedThaiArr); // 在这里设置断点
// s 仍然是 มาà¸� (我希望它是 มาก)
// 进行操作
}
当尝试处理第一个单词时(其他单词也都不起作用,但这是预期的),结果为 s = มาà¸�。
泰语文字在字符串数组中显示正确(声明直接从Android Studio复制),文件编码设置为java文件的UTF-8(根据这里),文件编码设置如下(根据这里):
<details>
<summary>英文:</summary>
I'm trying to develop an application within Android Studio on Windows 10.
PROBLEM: The following string array of Thai words:
String[] myTHarr = {"มาก","เชี่ยว","แน่","ม่อน","บ้าน","พูด","เลื่อย","เมื่อ","ช่ำ","แร่"};
...when processed by the following for-each loop:
for (String s:myTHarr){
//s = มา� before executing any of the below code:
byte[] utf8EncodedThaiArr = s.getBytes("UTF-8");
String utf8EncodedThai = new String(utf8EncodedThaiArr); //setting breakpoint here
// s is still มาà¸� (I want it to be มาก)
//do stuff
}
results in s = มา� when attempting to process the first word (none of the other words work either, but that's expected given the first fails).
The Thai script appears in the string array correctly (the declaration was copied straight from Android Studio), the file encoding is set to UTF-8 for the java file (per [here][1]), and the File Encoding Settings look like this (per [here][2]):
[![enter image description here][3]][3]
[1]: https://stackoverflow.com/questions/30082741/change-the-encoding-of-a-file-in-visual-studio-code
[2]: https://stackoverflow.com/questions/30184062/android-studio-project-encoding-default-value/30340777
[3]: https://i.stack.imgur.com/229yy.png
</details>
# 答案1
**得分**: 2
根据文档,`String(byte[])` 构造函数 "通过使用平台的默认字符集对指定的字节数组进行解码,构造一个新的字符串。"
我猜测默认的字符集不是 UTF-8。因此解决方案是为字节数组指定编码方式。
```java
String utf8EncodedThai = new String(utf8EncodedThaiArr, "UTF-8"); //在此设置断点
英文:
According to the documentation, String(byte[])
constructor "Constructs a new String by decoding the specified array of bytes using the platform's default charset."
I'm guessing that the default character set is not UTF-8. So the solution is to specify the encoding for the array of bytes.
String utf8EncodedThai = new String(utf8EncodedThaiArr, "UTF-8"); //setting breakpoint here
答案2
得分: 0
正如评论中有几位指出的那样,问题必须出在我的环境中。经过进一步搜索,我发现在更改了编码后应该重新构建项目(所以仅仅切换到UTF8并点击'Apply'/'OK'是不够的)。值得注意的是,我的文件编码设置如下,仅供参考:
重新构建后,我开始收到编译器错误“unmappable character for encoding cp1252”,该错误出现在包含泰语字符的字符串数组中(附注:其中一些泰语字符没问题,而其他字符则显示为� 和其他类似的字符。我本以为要么所有的泰语字符都能正常工作,要么所有字符都不能工作,但令我惊讶的是,甚至包括像 ก 这样常见的泰语字母也会导致编译器出错)。
这个错误引发了这个帖子,在帖子中我尝试了一些方法来将编译器选项设置为UTF8。由于我的应用程序恰好是安卓应用程序的一种“预处理”,因此与应用程序本身分开(如果有意义的话),所以我没有像前面提到的stackoverflow帖子中建议的那样使用compilerOptions属性的便利(尽管后来我已经将其添加到了安卓应用程序方面的gradle中)。这使我设置了通过PowerShell设置环境变量JAVA_TOOLS_OPTIONS的命令:
setx JAVA_TOOLS_OPTIONS "-Dfile.encoding=UTF8"
这解决了问题!
英文:
As several in the comments pointed out the problem had to be within my environment. After a bit more searching I found I should have rebuilt the project after changing the encodings (so merely switching to UTF8 and clicking 'Apply'/'OK' wasn't enough). I should note here that my File Encoding settings look like this, for reference:
Once I rebuilt, I started getting the compiler error "unmappable character for encoding cp1252" on the String array containing the Thai (side note: Some of the Thai characters were fine, others rendered as � and friends. I would have thought either all of the Thai would work or none of it, but was surprised to see even common Thai letters such as ก cause the compiler to choke).
That error led to this post in which I tried a few things to set the compiler options to UTF8. Since my application happens to be a sort of 'pre-process' for an android app, and is therefore separate from the app itself (if that makes any sense), I didn't have the luxury of using the compilerOptions attribute as the answers in the aforementioned SO post recommended (though I have since added it to the gradle on the android app side). This led me to setting the environment variable JAVA_TOOLS_OPTIONS via powershell:
setx JAVA_TOOLS_OPTIONS "-Dfile.encoding=UTF8"
Which fixed the issue!
答案3
得分: -1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论