替换字符串中的特殊字符

huangapple go评论92阅读模式
英文:

Replacing special characters from a string

问题

private String replaceSpecialChars(String fileName) {
    if (fileName.length() < 1) return null;

    String[][] replacements = {
        {"\u00DC", "Ue"},
        {"\u00C4", "Ae"},
        {"\u00D6", "Oe"},
        {"\u00FC", "ue"}
        // Add more replacements as needed
    };

    for (String[] replacement : replacements) {
        if (fileName.contains(replacement[0])) {
            fileName = fileName.replace(replacement[0], replacement[1]);
        }
    }

    return fileName;
}
英文:

Just would like to know if there is a more elegant and maintainable approach for this:

private String replaceSpecialChars(String fileName) {
    if (fileName.length() &lt; 1) return null;

    if (fileName.contains(&quot;&#220;&quot;)) {
        fileName = fileName.replace(&quot;&#220;&quot;, &quot;Ue&quot;);
    }

    if (fileName.contains(&quot;&#196;&quot;)) {
        fileName = fileName.replace(&quot;&#196;&quot;, &quot;Ae&quot;);
    }

    if (fileName.contains(&quot;&#214;&quot;)) {
        fileName = fileName.replace(&quot;&#214;&quot;, &quot;Oe&quot;);
    }

    if (fileName.contains(&quot;&#252;&quot;)) {
        fileName = fileName.replace(&quot;&#252;&quot;, &quot;ue&quot;);
    }

    ...

    return fileName;
}

I'm restricted to Java 6.

答案1

得分: 4

在继续之前,请注意您正在进行的操作实际上是不可能的。例如,瑞典语中的Ö的'ascii-fication'是'O',而不是'Oe'。没有办法知道一个词是瑞典语还是德语;毕竟,瑞典人有时会搬到德国。如果你打开德国的电话簿,看到一个叫Sjögren女士,然后将其转换为Sjoegren,你就搞砸了。

如果您想要运行'大小写和字符转换不敏感的比较',首先您必须回答一些问题。Muller是否等于Mueller是否等于Müller?那条兔子洞非常深。

一般的解决方案是三字母组(trigrams)或其他类似Postgres提供的通用文本搜索工具。或者,选择退出此机制,并以Unicode存储这些内容,并明确表示,要查找Sjögren女士,您将不得不搜索"Sjögren",原因与要找Johnson先生一样,您不能通过搜索Jahnson来找到他。

请注意,大多数文件系统都允许使用Unicode文件名;没有必要尝试替换Ü。

这也在一定程度上解释了为什么没有现成的库可用于这个看似常见的任务;事实上,这个任务是不可能完成的。

如果有必要,您可以通过使用包含替换项的Map&lt;String, String&gt;来简化此代码。基于上述原因,我建议不要这样做。或者,只需保持原样,但去掉contains方法。这段代码不必要地缓慢且冗长。

以下代码是等效的:

if (fileName.contains(&quot;x&quot;)) fileName = fileName.replace(&quot;x&quot;, &quot;y&quot;);

fileName = fileName.replace(&quot;x&quot;, &quot;y&quot;);

唯一的区别在于前者的速度较慢(如果要替换的字符串不存在,replace方法不会创建新字符串并返回自身。前者会进行两次搜索,而后者只会进行一次搜索,无论哪种方式都不会创建新字符串,除非实际需要进行字符串替换。

您可以链接多次调用:

if (fileName.isEmpty()) return null;
return fileName
    .replace(&quot;&#220;&quot;, &quot;Ue&quot;)
    .replace(&quot;&#196;&quot;, &quot;Ae&quot;)
    ...
    ;

但是,正如我之前所说,除非您想要在将来某个时候遇到一个愤怒的人抱怨您搞乱了他们姓氏的ascii化,否则您可能不想这样做。

英文:

Before you go any further on this, note that what you're doing is effectively impossible. For example, the 'ascii-fication' of 'Ö' in swedish is 'O' and not 'Oe'. There is no way to know if a word is swedish or german; after all, swedes sometimes move to germany, for example. If you open a german phonebook and you see a Mrs. Sjögren, and you asciify that to Sjoegren, you messed it up.

If you want to run 'case and asciification insensitive comparisons', well, first you have to answer a few questions. Is muller equal to mueller equal to müller? That rabbit hole goes quite deep.

The general solution is trigrams or other generalized text search tools such as provided by postgres. Alternatively, opt out of this mechanism and store this stuff in unicode, and be clear that to find Ms. Sjögren, you're going to have search for "Sjögren" for the same reason that to find Mr. Johnson, you're not going to if you try to search for Jahnson.

Note that most filesystems allow unicode filenames; there is no need to try to replace a Ü.

This also goes some way as to explain why there are no ready libraries available for this seemingly common job; the job is, in fact, impossible.

You can simplify this code by using a Map&lt;String, String&gt; with replacements if you must. I advise against it for the above reasons. Or, just.. keep it as is, but ditch the contains. This code is needlessly slow and lengthy.

There is no difference between:

if (fileName.contains(&quot;x&quot;)) fileName = fileName.replace(&quot;x&quot;, &quot;y&quot;);

and just fileName = fileName.replace(&quot;x&quot;, &quot;y&quot;); except that the former is strictly slower (replace does not make a new string and returns itself, if you ask it to replace a string that it does not contain. The former will search twice, the latter only once, and either one will make no new strings unless actual string replacing needs to be done.

You can then chain it:

if (fileName.isEmpty()) return null;
return fileName
    .replace(&quot;&#220;&quot;, &quot;Ue&quot;)
    .replace(&quot;&#196;&quot;, &quot;Ae&quot;)
    ...
    ;

But, as I said, you probably don't want to do that, unless you want an aggravated person on the line at some point in the future complaining that you bungled up the asciification of their surname.

答案2

得分: 0

你可以删除不必要的 if 语句,并使用一系列 String.replace 方法。你的代码可能如下所示:

private static String replaceSpecialChars(String fileName) {
    if (fileName == null)
        return null;
    else
        return fileName
                .replace("Ü", "Ue")
                .replace("Ä", "Ae")
                .replace("Ö", "Oe")
                .replace("ü", "ue");
}
public static void main(String[] args) {
    System.out.println(replaceSpecialChars("ABc"));       // ABc
    System.out.println(replaceSpecialChars("ÜÄÖü"));      // UeAeOeue
    System.out.println(replaceSpecialChars("").length()); // 0
    System.out.println(replaceSpecialChars(null));        // null
}
英文:

You can remove unnecessary if statements an use a chain of String.replace methods. Your code might look something like this:

private static String replaceSpecialChars(String fileName) {
    if (fileName == null)
        return null;
    else
        return fileName
                .replace(&quot;&#220;&quot;, &quot;Ue&quot;)
                .replace(&quot;&#196;&quot;, &quot;Ae&quot;)
                .replace(&quot;&#214;&quot;, &quot;Oe&quot;)
                .replace(&quot;&#252;&quot;, &quot;ue&quot;);
}
public static void main(String[] args) {
    System.out.println(replaceSpecialChars(&quot;ABc&quot;));       // ABc
    System.out.println(replaceSpecialChars(&quot;&#220;&#196;&#214;&#252;&quot;));      // UeAeOeue
    System.out.println(replaceSpecialChars(&quot;&quot;).length()); // 0
    System.out.println(replaceSpecialChars(null));        // null
}

huangapple
  • 本文由 发表于 2020年8月26日 20:32:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/63597770.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定