极致紧凑的UUID(使用所有字母数字字符)

huangapple go评论64阅读模式
英文:

Extremely compact UUID (using all alphanumeric characters)

问题

public String getBase62UIID() {
    String strUUID = UUID.randomUUID().toString().replace("-", "");
    BigInteger base10UUID = new BigInteger(strUUID, 16);

    String digits = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    StringBuilder base62UUID = new StringBuilder();

    while (base10UUID.compareTo(BigInteger.ZERO) > 0) {
        BigInteger remainder = base10UUID.mod(BigInteger.valueOf(62));
        base62UUID.insert(0, digits.charAt(remainder.intValue()));
        base10UUID = base10UUID.divide(BigInteger.valueOf(62));
    }

    return base62UUID.toString();
}
英文:

I need an extremely compact UUID, the shorter the better.

To that end, I wrote:

    public String getBase36UIID() {
        // More compact version of UUID
        String strUUID = UUID.randomUUID().toString().replace("-", "");
        return new BigInteger(strUUID, 16).toString(36);
    }

By executing this code, I get, for example:

5luppaye6086d5wp4fqyz57xb

That's good, but it's not the best. Base 36 uses all numeric digits and lowercase letters, but does not use uppercase letters.

If it were possible to use uppercase letters as separate digits from lowercase letters, it would be possible to theorize a numerical base 62, composed of these digits:

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

I could theorize numerical bases also using accented characters such as "è" or "é", or special characters such as "$" or "!", further increasing the number of digits available.

The use of these accented or special characters, however, may cause me problems, so for the moment I prefer not to consider them.

After all these premises, how can I convert the BigInteger representing my UUID into the base 62 above theorized, in order to make it even more compact? Thanks

I have already verified that a code like the following is not usable, because every base over 36 is treated as base 10:

return new BigInteger(strUUID, 16).toString(62);

After all, in mathematics there is no base 62 as I imagined it, but I suppose that in Java it can be created.

答案1

得分: 4

将一个数字转换为任意进制的通用算法基于除法与余数。

你从将数字除以进制开始。余数给出数字的最后一位 - 你将其映射为一个符号。如果商不为零,你将其除以进制。余数给出倒数第二位。然后你用商重复这个过程。

在Java中,使用BigInteger:

String toBase62(BigInteger number) {
    String symbols = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    BigInteger base = BigInteger.valueOf(symbols.length());

    StringBuilder result = new StringBuilder();
    do {
        BigInteger[] quotientAndRemainder = number.divideAndRemainder(base);
        number = quotientAndRemainder[0];
        result.append(symbols.charAt(quotientAndRemainder[1].intValue()));
    } while (number.compareTo(BigInteger.ZERO) > 0);

    return result.reverse().toString();
}

不过,您是否需要标识符是UUID呢?它不能只是任意的字母和数字序列吗?如果可以接受的话,您就不必处理进制转换。

String randomString(int length) {
    String symbols = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    Random rnd = new Random();
    StringBuilder str = new StringBuilder();
    for (int i = 0; i < length; i++) {
        str.append(symbols.charAt(rnd.nextInt(symbols.length())));
    }
    return str.toString();
}
英文:

The general algorithm for converting a number to any base is based on division with remainder.

You start by dividing the number by the base. The remainder gives you the last digit of the number - you map it to a symbol. If the quotient is nonzero, you divide it by the base. The remainder gives you the second to last digit. And you repeat the process with the quotient.

In Java, with BigInteger:

String toBase62(BigInteger number) {
    String symbols = &quot;0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&quot;;
    BigInteger base = BigInteger.valueOf(symbols.length());

    StringBuilder result = new StringBuilder();
    do {
        BigInteger[] quotientAndRemainder = number.divideAndRemainder(base);
        number = quotientAndRemainder[0];
        result.append(symbols.charAt(quotientAndRemainder[1].intValue()));
    } while (number.compareTo(BigInteger.ZERO) &gt; 0);
    
    return result.reverse().toString();
}

Do you need the identifier to be a UUID though? Couldn't it be just any random sequence of letters and numbers? If that's acceptable, you don't have to deal with number base conversions.

String randomString(int length) {
    String symbols = &quot;0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&quot;;
    Random rnd = new Random();
    StringBuilder str = new StringBuilder();
    for (int i = 0; i &lt; length; i++) {
        str.append(symbols.charAt(rnd.nextInt(symbols.length())));
    }
    return str.toString();
}

答案2

得分: 1

这并不应该很难。将数字转换为字符串是一项基本的编程任务。你使用的是62进制并不会有任何不同。

确定你愿意使用多少个字符,然后将你的大数字转换为该进制。将每个“数字”映射到其中一个字符。

伪代码:

 b = 基数(比如,62)
 valid_chars = 一个包含'b'个字符的数组
 u = UUID
 当 u 不等于 0 时:
    digit = u % b;
    char = valid_chars[digit];
    u = u / b;

这会产生从右到左的数字,但你应该能理解这个思路。

英文:

This should not be difficult. Converting a number to a string is a basic programming task. The fact that you're using base 62 makes no difference.

Decide how many characters you're willing to use, and then convert your large number to that base. Map each "digit" onto one of the characters.

Pseudocode:

 b = the base (say, 62)
 valid_chars = an array of &#39;b&#39; characters
 u = the uuid
 while u != 0:
    digit = u % b;
    char = valid_chars[digit];
    u = u / b;

This produces the digits right-to-left but you should get the idea.

答案3

得分: 1

主要思想与之前的帖子相同,但实现方式有一些差异。
<br>另请注意,如果希望为每个字符设置不同的出现概率,也可以进行调整。(主要是在数据结构上多添加一个字符,并更改其概率)

这里是每个字符的公平概率(等于 1/62)

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class RCode {
    String symbols = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    public static void main(String[] args)
    {
        RCode r = new RCode();
        System.out.println("symbols=" + r.symbols.length());
        System.out.println("code_10(+1)=" + r.generate(10));
        System.out.println("code_70(+2)=" + r.generate(70));
        //System.out.println("code_124(+3)=" + r.generate(124));
    }


    public String generate(int length)
    {
        int num = length / symbols.length() + 1;
        List&lt;Character&gt; list = new ArrayList&lt;Character&gt;();
        for(int i=0; i&lt;symbols.length(); i++)
        {
            //if needed to change probability of char occurrence then adapt here
            for(int j=0;j&lt;=num;j++)
            {
                list.add(symbols.charAt(i));
            }
        }
        //basically is the same as random
        Collections.shuffle(list);
        StringBuffer sb = new StringBuffer();
        for(int i=0; i&lt;length; i++)
        {
            sb.append(list.get(i));
        }
        return sb.toString();
    }
}

输出:

symbols=62
//each char is added once(+1)
code_10(+1)=hFW9ZFEAeU
code_70(+2)=hrHQCEdQ3F28apcJPnfjAaOu55Xso12xabkJ7MrU97U0HYkYhWwGEqVAiLOp3X3QSuq6qp

注意:算法存在一个缺陷,尝试找出为什么序列永远不会生成 10 个相同字符(aaaaaaaaaa)。很容易修复... 但我当时关注的是思想。
<br>现在,就像它是的,基本上是在num的范围内为每个字符生成。(对于某些人来说,随机输出可能会有用)

英文:

Main idea is the same as previous posts, but the implementation have some differences.
<br>Also note that if wanted different occurrence probability for each chars this can be adjusted also.(mainly add a character more time on a data structure and change his probability)

Here is fair-probability for each chars (equals, 1/62)

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class RCode {
String symbols = &quot;0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&quot;;
public static void main(String[] args)
{
RCode r = new RCode();
System.out.println(&quot;symbols=&quot;+r.symbols.length());
System.out.println(&quot;code_10(+1)=&quot;+r.generate(10));
System.out.println(&quot;code_70(+2)=&quot;+r.generate(70));
//System.out.println(&quot;code_124(+3)=&quot;+r.generate(124));
}
public String generate(int length)
{
int num = length/symbols.length()+1;
List&lt;Character&gt; list = new ArrayList&lt;Character&gt;();
for(int i=0; i&lt;symbols.length(); i++)
{
//if needed to change probability of char occurrence then adapt here
for(int j=0;j&lt;=num;j++)
{
list.add(symbols.charAt(i));
}
}
//basically is the same as random
Collections.shuffle(list);
StringBuffer sb = new StringBuffer();
for(int i=0; i&lt;length; i++)
{
sb.append(list.get(i));
}
return sb.toString();
}
}

Output:

symbols=62
//each char is added once(+1)
code_10(+1)=hFW9ZFEAeU
code_70(+2)=hrHQCEdQ3F28apcJPnfjAaOu55Xso12xabkJ7MrU97U0HYkYhWwGEqVAiLOp3X3QSuq6qp

Note: Algorithm have a defect, just try to figured out why the sequence will be never generate on 10 (aaaaaaaaaa). Easy to fix ... but i was focused on the idea.
<br>Now, as it is, basically is generating up to num each character. (random and maybe for someone will be useful the output)

huangapple
  • 本文由 发表于 2020年8月18日 03:44:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/63457605.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定