英文:
Problem with special characters in properties file
问题
I have properties files for translation. One file is in English, the other one in Swedish. For each page I want translation for I have separate properties files e.g. home.properties, home_en.properties, help.properties, help_en.properties. I also have the files under source control (github).
When I open a certain file I get the text in odd format e.g.:
lbl_draftsMan=Föredragande
lbl_draftsManEpost=Epost föredragande
alternative text for the English file is:
lbl_draftsMan=Presenter
lbl_draftsManEpost=Email presenter
I notice in Github that the text in the Swedish file is normal there:
lbl_draftsMan=Föredragande
lbl_draftsManEpost=Epost föredragande
I have the following properties for the file:
Field Name: $MimeCharSet
Data Type: Text
Data Length: 5 bytes
Seq Num: 5
Dup Item ID: 0
Field Flags: SIGN SUMMARY
"UTF-8"
Other properties files the same setting but there I do not have the coded character problem.
What is the reason for this? I assume Domino Designer is the root of the problem?
英文:
I have properties files for translation. One file is in English, the other one in Swedish. For each page I want translation for I have separate properties files e.g. home.properties, home_en.properties, help.properties, help_en.properties. I also have the files under source control (github).
When I open a certain file I get the text in odd format e.g.:
lbl_draftsMan=Föredragande
lbl_draftsManEpost=Epost föredragande
alternative text for the English file is:
lbl_draftsMan=Presenter
lbl_draftsManEpost=Email presenter
I notice in Github that the text in the Swedish file is normal there:
lbl_draftsMan=Föredragande
lbl_draftsManEpost=Epost föredragande
I have the following properties for the file:
Field Name: $MimeCharSet
Data Type: Text
Data Length: 5 bytes
Seq Num: 5
Dup Item ID: 0
Field Flags: SIGN SUMMARY
"UTF-8"
Other properties files the same setting but there I do not have the coded character problem.
What is the reason for this? I assume Domino Designer is the root of the problem?
答案1
得分: 1
我们在德国的属性文件中遇到了类似的问题。尽管我们的设置是UTF-8,但特殊字符显示不正确。只有当我们使用Unicode转义输入特殊字符时,特殊字符才会正确显示(例如 ö --> \u00f6)。
英文:
We had a similiar problem with our german property files. Although our setting was UTF-8, umlauts were displayed incorrectly. Only when we entered the umlauts in Unicode escape were the umlauts displayed correctly (e.g. ö --> \u00f6)
答案2
得分: 0
Property files are not UTF-8, you need to encode your content. Easiest way is a small standalone Java app reading you UTF-8 source and writing out using the Properties class. It takes care of encoding.
Updates:
-
there’s a command line utility: https://docs.oracle.com/javase/8/docs/technotes/tools/windows/native2ascii.html
-
save in Eclipse should do it too
-
if you want to write your own code, use this a starting point (you want to remove the hardcoded file names). Should even work for emoji.
import java.io.FileWriter;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.util.Properties;
import java.util.Scanner;
/*
* Demo of handling UTF-8 properties
*/
public class Umlaut {
public static void main(String[] args) throws Exception {
Umlaut u = new Umlaut();
u.run("source.txt", "target.properties");
}
void run(String sourceFileName, String targetFileName) throws Exception {
try (Writer writer = new FileWriter(targetFileName, StandardCharsets.ISO_8859_1);
Scanner scanner = new Scanner(Path.of(sourceFileName), StandardCharsets.UTF_8)) {
Properties properties = new Properties();
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] splits = line.split("=");
properties.setProperty(splits[0], escape(splits[1]));
}
properties.store(writer, "Transformed");
}
}
String escape(String source) {
final StringBuilder b = new StringBuilder();
for (int i = 0; i < source.length(); i++) {
char c = source.charAt(i);
convert(c, b);
}
return b.toString();
}
void convert(char source, StringBuilder b) {
if (source <= 0x7E) {
b.append(source);
return;
}
b.append("\\u");
String hex = "0000" + Integer.toHexString(source);
b.append(hex.substring(hex.length() - 4));
}
}
英文:
Property files are not UTF-8, you need to encode your content. Easiest way is a small standalone Java app reading you UTF-8 source and writing out using the Properties class. It takes care of encoding
Updates:
-
there’s a command line utility: https://docs.oracle.com/javase/8/docs/technotes/tools/windows/native2ascii.html
-
save in Eclipse should do it too
-
if you want to write your own code, use this a starting point (you want to remove the hardcoded file names). Should even work for emoji.
import java.io.FileWriter;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.util.Properties;
import java.util.Scanner;
/*
* Demo of handling UTF-8 properties
*/
public class Umlaut {
public static void main(String[] args) throws Exception {
Umlaut u = new Umlaut();
u.run("source.txt", "target.properties");
}
void run(String sourceFileName, String targetFileName) throws Exception {
try (Writer writer = new FileWriter(targetFileName, StandardCharsets.ISO_8859_1);
Scanner scanner = new Scanner(Path.of(sourceFileName), StandardCharsets.UTF_8)) {
Properties properties = new Properties();
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] splits = line.split("=");
properties.setProperty(splits[0], escape(splits[1]));
}
properties.store(writer, "Transformed");
}
}
String escape(String source) {
final StringBuilder b = new StringBuilder();
for (int i = 0; i < source.length(); i++) {
char c = source.charAt(i);
convert(c, b);
}
return b.toString();
}
void convert(char source, StringBuilder b) {
if (source <= 0x7E) {
b.append(source);
return;
}
b.append("\\u");
String hex = "0000" + Integer.toHexString(source);
b.append(hex.substring(hex.length() - 4));
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论