使ULID词典排序对时间更敏感

huangapple go评论63阅读模式
英文:

Make ULID lexicographic ordering more sensitive to time

问题

I used this ULID example in a project where I not only needed the uniqueness offered by ULID but also its lexicographic sortability.

我在一个项目中使用了这个ULID示例,不仅需要ULID提供的唯一性,还需要其字典排序功能。

I discovered, however, that no matter how much I tried, I could not just get the ids generated in a loop sorted.

然而,我发现无论我如何尝试,都无法只是获得生成的ID在循环中排序。

e.g.

例如:

class Test{
    public static void main(String[] args) {
            ArrayList<String> ulids = new ArrayList<>();
    
            for (int i = 0; i < 10; i++) {
               
                ulids.add(ULID.generate());
            }
    
            System.out.println("Original:\n..." + ulids);
    
            Collections.shuffle(ulids);
    
            System.out.println("Shuffled:\n..." + ulids);
    
            ulids.sort(new Comparator<String>() {
                @Override
                public int compare(String o1, String o2) {
                    return o1.compareTo(o2);
                }
            });
    
            System.out.println("Sorted:\n..." + ulids);
    
        }
}

Sample output:

示例输出:

Original:
...[01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng896t1qhsngrz3h251, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8w8qz9qtgy3958r1v, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng8ekj0vt393tw12x8j, 01edrp4ng80wacxxskgej5d8mm]
Shuffled:
...[01edrp4ng896t1qhsngrz3h251, 01edrp4ng8w8qz9qtgy3958r1v, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng80wacxxskgej5d8mm, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8ekj0vt393tw12x8j]
Sorted:
...[01edrp4ng80wacxxskgej5d8mm, 01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng896t1qhsngrz3h251, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng8ekj0vt393tw12x8j, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8w8qz9qtgy3958r1v]


I checked the implementation and figured that since time was a core factor in the generation of ULIDs, and also since the sensitivity of time used was the millisecond i.e. `(System.currentTimeMillis())` , I could get them sorted by introducing some delay in my id generation loop.

我检查了实现并发现,由于时间是生成ULID的核心因素,而且使用的时间灵敏度是毫秒,即`(System.currentTimeMillis())`,我可以通过在ID生成循环中引入一些延迟来使它们排序。

I introduced a delay of about 5 milliseconds and the ids all came out sorted; e.g:

我引入了约5毫秒的延迟,所有的ID都是排序的,例如:

```java
class TestWithMsDelay{

   public static void main(String[] args) {
        ArrayList<String> ulids = new ArrayList<>();

        for (int i = 0; i < 10; i++) {
            try {
                Thread.sleep(5L);
                ulids.add(ULID.generate());
            } catch (Exception ex) {
                ex.printStackTrace();
            }
        }

        System.out.println("Original:\n..." + ulids);

        Collections.shuffle(ulids);

        System.out.println("Shuffled:\n..." + ulids);

        ulids.sort(new Comparator<String>() {
            @Override
            public int compare(String o1, String o2) {
                return o1.compareTo(o2);
            }
        });

        System.out.println("Sorted:\n..." + ulids);

    }


}

Sample output:

示例输出:

Original:
...[2rjdme5a5h2ntcd20xq4z487tx, 2rjdme63a23ddsy0km21n6n34a, 2rjdme6pnrenx79zd3jj18est4, 2rjdme70bv45b648p82dbj584n, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme7psqdykt24qfymn2e4ba, 2rjdme80as7t1h1rr00m676718, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme9ea04x22rpx2f3rp5gez]
Shuffled:
...[2rjdme7psqdykt24qfymn2e4ba, 2rjdme6pnrenx79zd3jj18est4, 2rjdme80as7t1h1rr00m676718, 2rjdme63a23ddsy0km21n6n34a, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme70bv45b648p82dbj584n, 2rjdme9ea04x22rpx2f

英文:

I used this ULID example in a project where I not only needed the uniqueness offered by ULID but also its lexicographic sortability.

I discovered, however, that no matter how much I tried, I could not just get the ids generated in a loop sorted.

e.g.

class Test{
    public static void main(String[] args) {
            ArrayList&lt;String&gt; ulids = new ArrayList&lt;&gt;();
    
            for (int i = 0; i &lt; 10; i++) {
               
                ulids.add(ULID.generate());
            }
    
            System.out.println(&quot;Original:\n...&quot; + ulids);
    
            Collections.shuffle(ulids);
    
            System.out.println(&quot;Shuffled:\n...&quot; + ulids);
    
            ulids.sort(new Comparator&lt;String&gt;() {
                @Override
                public int compare(String o1, String o2) {
                    return o1.compareTo(o2);
                }
            });
    
            System.out.println(&quot;Sorted:\n...&quot; + ulids);
    
        }
}

Sample output:
Original:
...[01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng896t1qhsngrz3h251, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8w8qz9qtgy3958r1v, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng8ekj0vt393tw12x8j, 01edrp4ng80wacxxskgej5d8mm]
Shuffled:
...[01edrp4ng896t1qhsngrz3h251, 01edrp4ng8w8qz9qtgy3958r1v, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng80wacxxskgej5d8mm, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8ekj0vt393tw12x8j]
Sorted:
...[01edrp4ng80wacxxskgej5d8mm, 01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng896t1qhsngrz3h251, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng8ekj0vt393tw12x8j, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8w8qz9qtgy3958r1v]

I checked the implementation and figured that since time was a core factor in the generation of ULIDs, and also since the sensitivity of time used was the millisecond i.e. (System.currentTimeMillis()) , I could get them sorted by introducing some delay in my id generation loop.

I introduced a delay of about 5 milliseconds and the ids all came out sorted; e.g:

class TestWithMsDelay{

   public static void main(String[] args) {
        ArrayList&lt;String&gt; ulids = new ArrayList&lt;&gt;();

        for (int i = 0; i &lt; 10; i++) {
            try {
                Thread.sleep(5L);
                ulids.add(ULID.generate());
            } catch (Exception ex) {
                ex.printStackTrace();
            }
        }

        System.out.println(&quot;Original:\n...&quot; + ulids);

        Collections.shuffle(ulids);

        System.out.println(&quot;Shuffled:\n...&quot; + ulids);

        ulids.sort(new Comparator&lt;String&gt;() {
            @Override
            public int compare(String o1, String o2) {
                return o1.compareTo(o2);
            }
        });

        System.out.println(&quot;Sorted:\n...&quot; + ulids);

    }


}
Sample output:
Original:
...[2rjdme5a5h2ntcd20xq4z487tx, 2rjdme63a23ddsy0km21n6n34a, 2rjdme6pnrenx79zd3jj18est4, 2rjdme70bv45b648p82dbj584n, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme7psqdykt24qfymn2e4ba, 2rjdme80as7t1h1rr00m676718, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme9ea04x22rpx2f3rp5gez]
Shuffled:
...[2rjdme7psqdykt24qfymn2e4ba, 2rjdme6pnrenx79zd3jj18est4, 2rjdme80as7t1h1rr00m676718, 2rjdme63a23ddsy0km21n6n34a, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme70bv45b648p82dbj584n, 2rjdme9ea04x22rpx2f3rp5gez, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme5a5h2ntcd20xq4z487tx]
Sorted:
...[2rjdme5a5h2ntcd20xq4z487tx, 2rjdme63a23ddsy0km21n6n34a, 2rjdme6pnrenx79zd3jj18est4, 2rjdme70bv45b648p82dbj584n, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme7psqdykt24qfymn2e4ba, 2rjdme80as7t1h1rr00m676718, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme9ea04x22rpx2f3rp5gez]

This is not good enough for my work... I don't want to wait for any length of time to generate ulids(even if it is 10us - 100us), the concept of artificial delay bothers me very much, lol.

So, I modified ULID.java and changed the time source from System.currentTimeMillis() to System.nanoTime()

To my surprise, I no longer needed any time delay in the loop to get the output ULIDs sortable.

I feel there must be a snag somewhere though; because the Java Spec warns that System.nanoTime() is not necessarily more accurate than System.currentTimeMillis()

e.g in the Javadoc for System.nanoTime() , it says:

This method provides nanosecond precision, but not necessarily nanosecond resolution (that is, how frequently the value changes) - no guarantees are made except that the resolution is at least as good as that of currentTimeMillis().

Also, the Javadoc for System.nanoTime() seems to indicate that it is not relative to the Epoch (as is System.currentTimeMillis())

I believe that this may cause bad behaviour(messed up sortability over time and maybe affect uniqueness) in using System.nanoTime() in ULID.java instead of
System.currentTimeMillis()

QUESTION

  1. Are my fears legitimate
  2. What can I do if (1.) is true, to improve the ULID's time-sensitivity beyond 1 millisecond without destroying its strong points?

答案1

得分: 2

ULID有两部分:时间组件和随机组件。

时间组件是自1970年以来的毫秒计数。

随机组件在两种情况下更新:

  1. 当毫秒数变化时,会生成一个新的随机值;
  2. 当毫秒数相同时,随机值会增加一。

你在这里展示的实现没有执行第二步。

也许你可以包含类似这样的代码(仅为示例):

if (timestamp == previousTimestamp) {
    randomComponent++;
} else {
    randomComponent = RANDOM.nextLong();
}

我发现的另一个问题是它使用了Math.random(),而不是java.security.SecureRandom。为了解决这个问题,这是一个建议:

import java.security.SecureRandom;
private static final RANDOM = new SecureRandom();

最后,不建议使用System.nanoTime(),因为它返回自某一时刻以来的纳秒数。它不是您主板上的实时时钟(RTC)返回的日期时间。这个函数用于测量代码中两个时间点之间的经过时间,可能用于性能基准测试。示例:


long startNanos = System.nanoTime();

// 在这里执行一些昂贵的任务

long endNanos = System.nanoTime();

long elapsedNanos = endNanos - startNanos;

如果您愿意,可以检查ulid-creator库。也许它可以帮助您。示例:

// 生成ULID作为UUID
UUID ulid = UlidCreator.getUlid();

// 或生成ULID作为字符串(Crockford的base32)
String ulid = UlidCreator.getUlidString();

项目页面:https://github.com/f4b6a3/ulid-creator

英文:

The ULID has two parts: a time component and a random component.

The time component is the count of milliseconds since 1970.

The random component is updated in two cases:

  1. when the millisecond changes, a new random value is generated;
  2. when the millisecond is the same, the random value is incremented by one.

The implementation you show here doesn't do the second step.

Maybe you could include some code like this (just an example):

if (timestamp == previousTimestamp) {
	randomComponent++;
} else {
	randomComponent = RANDOM.nextLong();
}

Another problem that I found is that it uses Math.random(), instead of java.security.SecureRandom. To fix this, this is a sugestion:

import java.security.SecureRandom;
private static final RANDOM = new SecureRandom();

Finally, it's not recommended to use System.nanoTime() as it returns the number of nanoseconds since an arbitrary point in time. It's not the day time returned from the real time clock (RTC) in your mother board. This function is used to measure the elapsed time between two points in your code, maybe for benchmarking. Example:


long startNanos = System.nanoTime();

// do some expensive tasks here

long endNanos = System.nanoTime();

long elapsedNanos = endNanos - startNanos;

If you prefer, you can check the library ulid-creator. Maybe it can help. Example:

// Generate a ULID as UUID
UUID ulid = UlidCreator.getUlid();

// Or generate a ULID as String (Crockford&#39;s base32)
String ulid = UlidCreator.getUlidString();

Project page: https://github.com/f4b6a3/ulid-creator

EDIT

Sorry. I didn't answare the questions.

Are my fears legitimate

Yep, your wright.

What can I do if (1.) is true, to improve the ULID's time-sensitivity beyond 1 millisecond without destroying its strong points?

You can increase the ULID resolution, but it won't conform the ULID Spec (which is not a formal standard like RFC-4122 by the way). The resulting UUID is like a COMB GUID, created by Jimmy Wilson. The main idea on both is the same.

You can reserve more bits for timestamp component, but it will have a cost of some bits. For example, if you increase the time component from 48 to 64 bits, it will roll over around the year 2262 AD, but the random component will be reduced from 1208925819614629174706176 (2^80) to 18446744073709551616 (2^64). If the cost affects the strong points of ULID, it depends on your project.

I just implemented a generator for ULIDs with nanosecond resolution. I coincidently was working on it a few days ago. It actually has millisecond accuracy using the method System.currentTimeMillis(). The nanosecond resulution is simulated using the method System.nanoTime() between two subsequent calls.

If you still intend to use nanoseconds ULID, feel free to test it:

package your.package.name;

import java.security.SecureRandom;
import java.time.Instant;
import java.util.UUID;

/**
 * Utility class that creates a COMB GUID with nanoseconds resolution.
 * 
 * It borrows the main idea from ULID and COMB generators: a concatenation of
 * time and random bytes. It is composed of 64 bits for time and 64 for random
 * bits.
 * 
 * A Nano COMB has two components:
 * 
 * 1. Time camponent (64 bits): nanoseconds since 1970
 * 
 * 2. Random component (64 bits): a value generated by a secure random
 * generator.
 * 
 * Maximum time component year is ~2262 A.D. (2^63/10^9/60/60/24/365.25 + 1970)
 * 
 * @author: Fabio Lima 2020
 */
public final class NanoCombCreator {

	private long prevTime = 0;
	private long prevNano = 0;

	private static final long ONE_MILLION_NANOSECONDS = 1_000_000L;

	private static final SecureRandom SECURE_RANDOM = new SecureRandom();

	/**
	 * Returns a time component in nanoseconds.
	 * 
	 * It uses `System.currentTimeMillis()` to get the system time in milliseconds
	 * accuracy. The nanoseconds resolution is simulated by calling
	 * `System.nanoTime()` between subsequent calls within the same millisecond.
	 * It&#39;s not precise, but it provides some monotonicity to the values generates.
	 * 
	 * @return the current time in nanoseconds
	 */
	private synchronized long getTimeComponent() {

		final long time = System.currentTimeMillis();
		final long nano = System.nanoTime();
		final long elapsed; // nanoseconds since last call

		if (time == prevTime) {
			elapsed = (nano - prevNano);
			if (elapsed &gt; ONE_MILLION_NANOSECONDS) {
				try {
					// make the clock to catch up
					Thread.sleep(1);
				} catch (InterruptedException e) {
					System.err.println(&quot;something went wrong...&quot;);
				}
			}
		} else {
			prevTime = time;
			prevNano = nano;
			elapsed = 0;
		}

		return (time * ONE_MILLION_NANOSECONDS) + elapsed;
	}

	/**
	 * Returns the random component using a secure random generator.
	 * 
	 * @return a random value.
	 */
	private synchronized long getRandomComponent() {
		return SECURE_RANDOM.nextLong();
	}

	/**
	 * Returns a Nano COMB.
	 * 
	 * A Nano COMB is inspired on ULID and COMB generators.
	 * 
	 * It is composed of 64 bits for time and 64 for random bits.
	 * 
	 * @return a UUID
	 */
	public synchronized UUID create() {

		final long timeBits = getTimeComponent();
		final long randomBits = getRandomComponent();

		return new UUID(timeBits, randomBits);
	}

	/**
	 * Test method that generates many Nano COMBs in a loop.
	 * 
	 * @param args
	 */
	public static void main(String[] args) {

		NanoCombCreator creator = new NanoCombCreator();

		for (int i = 0; i &lt; 100; i++) {
			// Generate a Nano COMB
			UUID uuid = creator.create();

			// Extract the milliseconds and nanoseconds
			long milliseconds = uuid.getMostSignificantBits() / ONE_MILLION_NANOSECONDS;
			long nanoseconds = uuid.getMostSignificantBits() &amp; ONE_MILLION_NANOSECONDS;

			// Instantiate an instant using the milliseconds and nanoseconds
			Instant time = Instant.ofEpochMilli(milliseconds).plusNanos(nanoseconds);

			// Print the UUID and the time it was generated (UTC)
			System.out.println(&quot;UUID: &#39;&quot; + uuid + &quot;&#39;, time: &quot; + time);
		}
	}
}

OUTPUT:

UUID: &#39;16240ee8-3865-1503-d1fb-b4e85f991c6b&#39;, time: 2020-07-22T11:15:58.537327680Z
UUID: &#39;16240ee8-3865-f90a-ca19-3ec529750ef7&#39;, time: 2020-07-22T11:15:58.537344064Z
UUID: &#39;16240ee8-3866-dd7c-f32f-7acaebcf7766&#39;, time: 2020-07-22T11:15:58.537409664Z
UUID: &#39;16240ee8-3868-0a99-3ead-b114e1d61520&#39;, time: 2020-07-22T11:15:58.537524800Z
UUID: &#39;16240ee8-3868-efc8-937d-599c72de71a6&#39;, time: 2020-07-22T11:15:58.537541248Z
UUID: &#39;16240ee8-386a-3643-6a5e-e3b5e3b03c71&#39;, time: 2020-07-22T11:15:58.537655936Z
UUID: &#39;16240ee8-386b-132f-7016-057ab30a2920&#39;, time: 2020-07-22T11:15:58.537721408Z
UUID: &#39;16240ee8-386b-f929-d5b0-f70b68aea3d9&#39;, time: 2020-07-22T11:15:58.537737280Z

huangapple
  • 本文由 发表于 2020年7月21日 21:05:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/63015186.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定