2020年8月21日 05:33:38go评论118阅读模式

英文:

Strings display problems after converting java source files to utf-8 and setting eclipse to utf-8

问题

为了适应新的测试工具，我不得不将所有的 Java 源文件从 windows1252 或 iso-8859-1 转换为 utf-8，并且将 Eclipse 配置更改为默认使用 utf-8。但是转换后，一些包含重音的字符串出现了问题。

这些字符串是从数据库中读取的（NLS_CHARACTERSET: WE8MSWIN1252），然后通过套接字发送到 Delphi 程序。数据库和 Delphi 程序都没有被修改。

从数据库检索字符串的代码如下：

ArrayList<String> menus = new ArrayList<String>();
String query = "SELECT ITEM FROM menus ...";
psmt = con.prepareStatement(query);
rs = psmt.executeQuery();
while (rs.next()) {
    if (rs.getString("ITEM") == null) continue;
    String s = rs.getString("ITEM");
    menus.add(s);
}
return menus;

然后使用套接字和 PrintWriter 将它们发送到其他程序：

Socket socket = new Socket(getTcpIPAddress(), getTcpCommandPort());
PrintWriter pred = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())), true);
String str = "ADD:";
str = str.concat(menus.get(0));
pred.println(str);

我尝试了许多不同的转换来创建要发送的字符串，但仍然得到了奇怪的字符，而不是重音符号。

String s = rs.getString("ITEM");
String m1 = new String(s.getBytes("UTF-8"));
String m2 = new String(s.getBytes("UTF-8"), "ISO-8859-1");
String m3 = new String(s.getBytes("ISO-8859-1"));
String m4 = new String(s.getBytes("ISO-8859-1"), "UTF-8");
String m5 = new String(s.getBytes(), "ISO-8859-1");
String m6 = new String(s.getBytes(), "UTF-8");
byte[] ba = rs.getBytes("ITEM");
String b1 = new String(ba);
String b2 = new String(ba, "ISO-8859-1");
String b3 = new String(ba, "UTF-8");
String b4 = new String(ba, "windows-1252");
String b5 = new String(ba, "US-ASCII");

除了将源文件转换回去并重置 Eclipse 的默认配置，还有没有其他办法可以恢复重音符号吗？

英文:

In order to adapt to new testing tools, I had to convert all my java source files to utf-8 (mostly from windows1252 or iso-8859-1) and changed Eclipse configuration to use utf-8 by default. But the conversion resulted in problems on some strings containing accents.

These strings are read from a database (NLS_CHARACTERSET : WE8MSWIN1252) then sent to a Delphi program using a socket. Neither the database nor the delphi program have been modified.

The strings are retrieved from the database using :

ArrayList&lt;String&gt; menus = new ArrayList&lt;String&gt;(); 
String query = &quot;SELECT ITEM FROM menus ...&quot;;
psmt = con.prepareStatement( query );
rs = psmt.executeQuery();
while( rs.next() ) {
    if( rs.getString( &quot;ITEM&quot; ) == null ) continue;
	String s = rs.getString( &quot;ITEM&quot; );
	menus.add( s );
}
return menus;

Then they are sent to the other program using a socket and printwriter

Socket socket = new Socket( getTcpIPAddress(), getTcpCommandPort() );
PrintWriter pred = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())), true);
String str = &quot;ADD:&quot;;
str = str.concat( menus.get( 0 ) );
pred.println(str);

I've tried a number of different conversions to create string to send but I still get strange characters instead of accents

String s = rs.getString( &quot;ITEM&quot; );
String m1 = new String( s.getBytes(&quot;UTF-8&quot;) );
String m2 = new String( s.getBytes(&quot;UTF-8&quot;), &quot;ISO-8859-1&quot; );
String m3 = new String( s.getBytes(&quot;ISO-8859-1&quot;) );
String m4 = new String( s.getBytes(&quot;ISO-8859-1&quot;), &quot;UTF-8&quot; );
String m5 = new String( s.getBytes(), &quot;ISO-8859-1&quot; );
String m6 = new String( s.getBytes(), &quot;UTF-8&quot; );
byte[] ba = rs.getBytes( &quot;ITEM&quot; );
String b1 = new String( ba ); 
String b2 = new String( ba, &quot;ISO-8859-1&quot; ); 
String b3 = new String( ba, &quot;UTF-8&quot; ); 
String b4 = new String( ba, &quot;windows-1252&quot; ); 
String b5 = new String( ba, &quot;US-ASCII&quot; );

Any idea how to get my accents back, apart from converting the source files back and resetting the default configuration for Eclipse?

答案1

得分: 0

编码在在将位和字符之间进行转换以及进行反向转换时总是发挥作用。#getBytes() 调用本身正在根据您的平台的运行时默认字符集将字符串中的字符转换为位。有一些 #getBytes() 的版本会接受字符集信息，以帮助避免这种情况。您应该在那里指定一个字符集，以及在实例化 OutputStreamWriter 时，为了避免这些意外的变化。

英文:

Encoding always comes into play when converting between bits and characters and back. The #getBytes() call itself is converting the characters in the string into bits according to your platform's runtime default charset. There are versions of #getBytes() that take character set information to help avoid that. You should specify a charset there, as well as when you instantiate the OutputStreamWriter, to avoid these unintended changes.

答案2

得分: 0

罪魁祸首是 Eclipse 的配置，尽管我不明白为什么。

在“窗口” -> “首选项”，“常规” -> “工作空间” -> “文本文件编码”中将选项恢复为默认值（Cp1252）即可解决此问题。

英文:

The culprit was the Eclipse configuration although I don't understand why.

Setting the options back to default (Cp1252) in Window -> Preferences, General -> Workspace -> "Text file encoding" fixed this issue.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Strings display problems after converting java source files to UTF-8 and setting Eclipse to UTF-8.

问题

答案1

答案2

刷新RecyclerView中的数据并保持其滚动位置会将用户带到活动的顶部。

使用密码表对字符串进行编码

Appium Question. If "listen eaddrinuse: address already in use", how to stop it? why it didn't stop?

如何防止Eclipse在将表达式赋值给局部变量时生成类型注释

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。