2023年4月13日 19:01:33go评论82阅读模式

英文:

Problem with special characters in properties file

问题

I have properties files for translation. One file is in English, the other one in Swedish. For each page I want translation for I have separate properties files e.g. home.properties, home_en.properties, help.properties, help_en.properties. I also have the files under source control (github).

When I open a certain file I get the text in odd format e.g.:

lbl_draftsMan=FÃ¶redragande
lbl_draftsManEpost=Epost fÃ¶redragande

alternative text for the English file is:

lbl_draftsMan=Presenter
lbl_draftsManEpost=Email presenter

I notice in Github that the text in the Swedish file is normal there:

lbl_draftsMan=Föredragande
lbl_draftsManEpost=Epost föredragande

I have the following properties for the file:

Field Name: $MimeCharSet
Data Type: Text
Data Length: 5 bytes
Seq Num: 5
Dup Item ID: 0
Field Flags: SIGN SUMMARY

"UTF-8"

Other properties files the same setting but there I do not have the coded character problem.

What is the reason for this? I assume Domino Designer is the root of the problem?

英文:

When I open a certain file I get the text in odd format e.g.:

lbl_draftsMan=F&#195;&#182;redragande
lbl_draftsManEpost=Epost f&#195;&#182;redragande

alternative text for the English file is:

lbl_draftsMan=Presenter
lbl_draftsManEpost=Email presenter

I notice in Github that the text in the Swedish file is normal there:

lbl_draftsMan=F&#246;redragande
lbl_draftsManEpost=Epost f&#246;redragande

I have the following properties for the file:

Field Name: $MimeCharSet
Data Type: Text
Data Length: 5 bytes
Seq Num: 5
Dup Item ID: 0
Field Flags: SIGN SUMMARY 
&quot;UTF-8&quot;

Other properties files the same setting but there I do not have the coded character problem.

What is the reason for this? I assume Domino Designer is the root of the problem?

答案1

得分: 1

我们在德国的属性文件中遇到了类似的问题。尽管我们的设置是UTF-8，但特殊字符显示不正确。只有当我们使用Unicode转义输入特殊字符时，特殊字符才会正确显示（例如 ö --> \u00f6）。

英文:

We had a similiar problem with our german property files. Although our setting was UTF-8, umlauts were displayed incorrectly. Only when we entered the umlauts in Unicode escape were the umlauts displayed correctly (e.g. ö --> \u00f6)

答案2

得分: 0

Property files are not UTF-8, you need to encode your content. Easiest way is a small standalone Java app reading you UTF-8 source and writing out using the Properties class. It takes care of encoding.

Updates:

there’s a command line utility: https://docs.oracle.com/javase/8/docs/technotes/tools/windows/native2ascii.html
save in Eclipse should do it too
if you want to write your own code, use this a starting point (you want to remove the hardcoded file names). Should even work for emoji.

import java.io.FileWriter;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.util.Properties;
import java.util.Scanner;
/*
 * Demo of handling UTF-8 properties
 */
public class Umlaut {
    public static void main(String[] args) throws Exception {
        Umlaut u = new Umlaut();
        u.run("source.txt", "target.properties");
    }
    void run(String sourceFileName, String targetFileName) throws Exception {
        try (Writer writer = new FileWriter(targetFileName, StandardCharsets.ISO_8859_1);
                Scanner scanner = new Scanner(Path.of(sourceFileName), StandardCharsets.UTF_8)) {
            Properties properties = new Properties();
            while (scanner.hasNextLine()) {
                String line = scanner.nextLine();
                String[] splits = line.split("=");
                properties.setProperty(splits[0], escape(splits[1]));
            }
            properties.store(writer, "Transformed");
        }
    }
    String escape(String source) {
        final StringBuilder b = new StringBuilder();
        for (int i = 0; i < source.length(); i++) {
            char c = source.charAt(i);
            convert(c, b);
        }
        return b.toString();
    }
    void convert(char source, StringBuilder b) {
        if (source <= 0x7E) {
            b.append(source);
            return;
        }
        b.append("\\u");
        String hex = "0000" + Integer.toHexString(source);
        b.append(hex.substring(hex.length() - 4));
    }
}

英文:

Updates:

there’s a command line utility: https://docs.oracle.com/javase/8/docs/technotes/tools/windows/native2ascii.html
save in Eclipse should do it too
if you want to write your own code, use this a starting point (you want to remove the hardcoded file names). Should even work for emoji.

import java.io.FileWriter;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.util.Properties;
import java.util.Scanner;
/*
 * Demo of handling UTF-8 properties
 */
public class Umlaut {
    public static void main(String[] args) throws Exception {
        Umlaut u = new Umlaut();
        u.run(&quot;source.txt&quot;, &quot;target.properties&quot;);
    }
    void run(String sourceFileName, String targetFileName) throws Exception {
        try (Writer writer = new FileWriter(targetFileName, StandardCharsets.ISO_8859_1);
                Scanner scanner = new Scanner(Path.of(sourceFileName), StandardCharsets.UTF_8)) {
            Properties properties = new Properties();
            while (scanner.hasNextLine()) {
                String line = scanner.nextLine();
                String[] splits = line.split(&quot;=&quot;);
                properties.setProperty(splits[0], escape(splits[1]));
            }
            properties.store(writer, &quot;Transformed&quot;);
        }
    }
    String escape(String source) {
        final StringBuilder b = new StringBuilder();
        for (int i = 0; i &lt; source.length(); i++) {
            char c = source.charAt(i);
            convert(c, b);
        }
        return b.toString();
    }
    void convert(char source, StringBuilder b) {
        if (source &lt;= 0x7E) {
            b.append(source);
            return;
        }
        b.append(&quot;\\u&quot;);
        String hex = &quot;0000&quot; + Integer.toHexString(source);
        b.append(hex.substring(hex.length() - 4));
    }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

问题在属性文件中的特殊字符。

问题

答案1

答案2

无效字符的字节到字符串转换

Jsoup.Element.text()方法没有正确编码UTF-8。

为什么utf8.ValidString函数无法检测到无效的Unicode字符？

为什么Go会向我的字符串添加字节？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。