Strings display problems after converting java source files to UTF-8 and setting Eclipse to UTF-8.

huangapple go评论81阅读模式
英文:

Strings display problems after converting java source files to utf-8 and setting eclipse to utf-8

问题

为了适应新的测试工具,我不得不将所有的 Java 源文件从 windows1252 或 iso-8859-1 转换为 utf-8,并且将 Eclipse 配置更改为默认使用 utf-8。但是转换后,一些包含重音的字符串出现了问题。

这些字符串是从数据库中读取的(NLS_CHARACTERSET: WE8MSWIN1252),然后通过套接字发送到 Delphi 程序。数据库和 Delphi 程序都没有被修改。

从数据库检索字符串的代码如下:

ArrayList<String> menus = new ArrayList<String>();
String query = "SELECT ITEM FROM menus ...";
psmt = con.prepareStatement(query);
rs = psmt.executeQuery();
while (rs.next()) {
    if (rs.getString("ITEM") == null) continue;
    String s = rs.getString("ITEM");
    menus.add(s);
}
return menus;

然后使用套接字和 PrintWriter 将它们发送到其他程序:

Socket socket = new Socket(getTcpIPAddress(), getTcpCommandPort());
PrintWriter pred = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())), true);

String str = "ADD:";
str = str.concat(menus.get(0));
pred.println(str);

我尝试了许多不同的转换来创建要发送的字符串,但仍然得到了奇怪的字符,而不是重音符号。

String s = rs.getString("ITEM");
String m1 = new String(s.getBytes("UTF-8"));
String m2 = new String(s.getBytes("UTF-8"), "ISO-8859-1");
String m3 = new String(s.getBytes("ISO-8859-1"));
String m4 = new String(s.getBytes("ISO-8859-1"), "UTF-8");
String m5 = new String(s.getBytes(), "ISO-8859-1");
String m6 = new String(s.getBytes(), "UTF-8");

byte[] ba = rs.getBytes("ITEM");
String b1 = new String(ba);
String b2 = new String(ba, "ISO-8859-1");
String b3 = new String(ba, "UTF-8");
String b4 = new String(ba, "windows-1252");
String b5 = new String(ba, "US-ASCII");

除了将源文件转换回去并重置 Eclipse 的默认配置,还有没有其他办法可以恢复重音符号吗?

英文:

In order to adapt to new testing tools, I had to convert all my java source files to utf-8 (mostly from windows1252 or iso-8859-1) and changed Eclipse configuration to use utf-8 by default. But the conversion resulted in problems on some strings containing accents.

These strings are read from a database (NLS_CHARACTERSET : WE8MSWIN1252) then sent to a Delphi program using a socket. Neither the database nor the delphi program have been modified.

The strings are retrieved from the database using :

ArrayList&lt;String&gt; menus = new ArrayList&lt;String&gt;(); 
String query = &quot;SELECT ITEM FROM menus ...&quot;;
psmt = con.prepareStatement( query );
rs = psmt.executeQuery();
while( rs.next() ) {
    if( rs.getString( &quot;ITEM&quot; ) == null ) continue;
	String s = rs.getString( &quot;ITEM&quot; );
	menus.add( s );
}
return menus;

Then they are sent to the other program using a socket and printwriter

Socket socket = new Socket( getTcpIPAddress(), getTcpCommandPort() );
PrintWriter pred = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())), true);

String str = &quot;ADD:&quot;;
str = str.concat( menus.get( 0 ) );
pred.println(str);

I've tried a number of different conversions to create string to send but I still get strange characters instead of accents

String s = rs.getString( &quot;ITEM&quot; );
String m1 = new String( s.getBytes(&quot;UTF-8&quot;) );
String m2 = new String( s.getBytes(&quot;UTF-8&quot;), &quot;ISO-8859-1&quot; );
String m3 = new String( s.getBytes(&quot;ISO-8859-1&quot;) );
String m4 = new String( s.getBytes(&quot;ISO-8859-1&quot;), &quot;UTF-8&quot; );
String m5 = new String( s.getBytes(), &quot;ISO-8859-1&quot; );
String m6 = new String( s.getBytes(), &quot;UTF-8&quot; );

byte[] ba = rs.getBytes( &quot;ITEM&quot; );
String b1 = new String( ba ); 
String b2 = new String( ba, &quot;ISO-8859-1&quot; ); 
String b3 = new String( ba, &quot;UTF-8&quot; ); 
String b4 = new String( ba, &quot;windows-1252&quot; ); 
String b5 = new String( ba, &quot;US-ASCII&quot; ); 

Any idea how to get my accents back, apart from converting the source files back and resetting the default configuration for Eclipse?

答案1

得分: 0

编码在在将位和字符之间进行转换以及进行反向转换时总是发挥作用。#getBytes() 调用本身正在根据您的平台的运行时默认字符集将字符串中的字符转换为位。有一些 #getBytes() 的版本会接受字符集信息,以帮助避免这种情况。您应该在那里指定一个字符集,以及在实例化 OutputStreamWriter 时,为了避免这些意外的变化。

英文:

Encoding always comes into play when converting between bits and characters and back. The #getBytes() call itself is converting the characters in the string into bits according to your platform's runtime default charset. There are versions of #getBytes() that take character set information to help avoid that. You should specify a charset there, as well as when you instantiate the OutputStreamWriter, to avoid these unintended changes.

答案2

得分: 0

罪魁祸首是 Eclipse 的配置,尽管我不明白为什么。

在“窗口” -> “首选项”,“常规” -> “工作空间” -> “文本文件编码”中将选项恢复为默认值(Cp1252)即可解决此问题。

英文:

The culprit was the Eclipse configuration although I don't understand why.

Setting the options back to default (Cp1252) in Window -> Preferences, General -> Workspace -> "Text file encoding" fixed this issue.

huangapple
  • 本文由 发表于 2020年8月21日 05:33:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/63513420.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定