如何获取所选区域的所有国家字符?

huangapple go评论61阅读模式
英文:

How to get all national characters for selected Locale?

问题

在我的应用程序中,我需要基于所有可用的国家字符生成密码,比如:

private String generatePassword(String charSet, int passwordLength) {
    char[] symbols = charSet.toCharArray();
    StringBuilder sbPassword = new StringBuilder();
    Random wheel = new Random();

    for (int i = 0; i < passwordLength; i++) {
        int random = wheel.nextInt(symbols.length);
        sbPassword.append(symbols[random]);
    }
    return sbPassword.toString();
}

对于拉丁字符集,我们可以有类似以下的内容:

charSet = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz";

如何获得类似的包含所有国家字符(字母)的 String,比如泰语、阿拉伯语或希伯来语?

我的意思是,我们都知道 Unicode 包含了所有可用于任何语言环境的国家字符,因此必须有一种方法来获取它们,否则我将不得不硬编码国家字母表 - 这是不美观的...(在我的情况下,我的应用程序支持10多种语言环境)。

英文:

In my app I need to generate passwords based on all available national characters, like:

private String generatePassword(String charSet, int passwordLength) {
    char[] symbols=charSet.toCharArray();
    StringBuilder sbPassword=new StringBuilder();
    Random wheel = new Random();

    for (int i = 0; i &lt; passwordLength; i++) {
       int random = wheel.nextInt(symbols.length);
       sbPassword.append(symbols[random]);
    }
    return sbPassword.toString();
}

For Latin we have smth like:

charSet=&quot;AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz&quot;;

How to get similar String containing all national characters (alphabet) let's say for Thai, Arab or Hebrew?

I mean, all we know that Unicode contains all national characters available for any Locale, so there has to be a way to get them, otherwise I'd be forced to hardcode national alphabets - which is ugly... (in my case my app supports more than 10 locales)

答案1

得分: 2

由于您正在使用char[],无法表示所有脚本中的所有Unicode代码点,因为其中一些代码点位于基本多语言平面之外,无法适应单个char。不幸的是,没有简单的方法可以获取脚本的所有代码点,而不是通过循环遍历它们,就像这样:

char[] charsForScript(Character.UnicodeScript script) {
  StringBuilder sb = new StringBuilder();
  for (int cp = 0; cp < Character.MAX_VALUE; ++cp) {
    if (Character.isValidCodePoint(cp) && script == Character.UnicodeScript.of(cp)) {
      sb.appendCodePoint(cp);
    }
  }
  return sb.toString().toCharArray();
}

这将返回给定脚本(如拉丁文、希腊文等)的所有字符。

要获取所有代码点,甚至是BMP之外的代码点,您可以使用:

int[] charsForScript(Character.UnicodeScript script) {
  List<Integer> ints = new ArrayList<>();
  for (int cp = 0; cp < Character.MAX_CODE_POINT; ++cp) {
    if (Character.isValidCodePoint(cp) && script == Character.UnicodeScript.of(cp)) {
      ints.add(cp);
    }
  }
  return ints.stream().mapToInt(i -> i).toArray();
}
英文:

Since you're using char[], you aren't going to be able to represent all Unicode code points in all scripts, since some of them will be outside the Basic Multilingual Plane and will not fit in a single char. Unfortunately, there is no easy way to get all the code points for a script without looping through them, like so:

char[] charsForScript(Character.UnicodeScript script) {) {
  StringBuilder sb = new StringBuilder();
  for (int cp = 0; cp &lt; Character.MAX_VALUE; ++cp) {
    if (Character.isValidCodePoint(cp) &amp;&amp; script == Character.UnicodeScript.of(cp)) {
      sb.appendCodePoint(cp);
    }
  }
  return sb.toString().toCharArray();
}

This will return all the chars for a given script e.g., LATIN, GREEK, etc.

To get all code points, even outside the BMP, you could use:

int[] charsForScript(Character.UnicodeScript script) {) {
  List&lt;Integer&gt; ints = new ArrayList&lt;&gt;();
  for (int cp = 0; cp &lt; Character.MAX_CODE_POINT; ++cp) {
    if (Character.isValidCodePoint(cp) &amp;&amp; script == Character.UnicodeScript.of(cp)) {
      ints.add(cp);
    }
  }
  return ints.stream().mapToInt(i -&gt; i).toArray();
}

huangapple
  • 本文由 发表于 2020年5月5日 04:06:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/61600701.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定