英文:
On Multiple JVM, Java's UUID.randomUUID, what is probability for collision?
问题
- 在基于微服务的架构中,多个微服务并行运行以实现水平扩展性。
- 所有服务都使用相同的算法生成UUID(UUID.randomUUID),一旦生成UUID,它就会被保存在数据库中并返回给调用服务。几秒钟后,调用者会发送请求以验证带有UUID的交易状态。
- 在关系型数据库中,UUID是主键。我们已经看到不同服务生成的UUID发生了冲突。问题:
- 跨JVM之间重复UUID的可能性是多少?
- 我们是否应该在代码中添加一些逻辑来在将UUID保存到数据库之前验证冲突?
英文:
I am building microservice based architecture where multiple microservices are running in parallel for horizontal scalability.
All services are using same algorithm to generate UUID (UUID.randomUUID), once UUID is generated it is saved in database and returned to calling service. After few seconds caller sends request to verify status of txn with UUID.
In relational DB UUID is primary key, We have seen collision of UUID generated by different services. Questions
- what is possibility of duplicate UUID across JVMs?
- Should we add some logic in code to verify collision before saving it to DB?
答案1
得分: 2
- 可跨多个JVM重复UUID的可能性有多大?
这是可能的,但概率非常小。维基百科上的“生日问题”页面有一个概率表,可以用来估算碰撞的可能性。
例如,对于128位的随机UUID(并且使用高质量的随机数生成器),表格显示您需要生成2.6 x 1010个UUID,碰撞的概率才会达到1018中的1个。
在文章的前面部分,您会找到计算和估算概率的数学内容。
- 我们是否应该在代码中添加一些逻辑来在保存到数据库之前验证碰撞?
这实际上取决于您可能生成和存储的UUID数量,以及您愿意接受的碰撞概率。
但是,如果您担心可能发生碰撞,您可以将UUID列设置为相关数据库表中的唯一键。事实上,由于硬件错误的可能性比碰撞导致的唯一性约束失败更高,因此事务可能会因硬件错误而失败!
跟进问题:
我不确定这个概率是针对一个生成器还是多个生成器的?
生成器的数量不重要,只要它们是独立的随机数生成器。
正如我所说,数学不会说谎。如果您在100万个事务中看到了几百次碰撞,那么其他方面可能存在问题。假设可能是错误的。
例如:
- 可能您正在使用弱PRNG。
- 可能您在为PRNG提供种子时使用了固定的种子或使用了熵来源不足的方法。
- 可能您以某种方式修改(例如缩短)UUID,从而大大降低了它们的有效位数。
- 可能您的UUID生成方法中的某些内容导致UUID连续两次被分配...有时候。
- 可能对象被错误地复制,导致具有相同UUID的两个副本。
- 可能有人/某物伪造UUID。
在开始怀疑数学之前,有很多事情需要检查。
我的疑虑是,所有4个服务都使用相同的算法,概率会增加。
正如我所说,生成器的数量不会改变数学。
英文:
> 1. What is possibility of duplicate UUID across JVMs.
It is possible, but the probability is vanishingly small. The Wikipedia page on the Birthday Problem has a probability table that can be used to estimate the likelihood of a collision.
For example, with 128 bit random UUIDs (and a high quality random number generator) the table says that you would need to generate 2.6 x 10<sup>10</sup> UUIDs for the probability of a collision to reach 1 in 10<sup>18</sup>.
Earlier in the article you will find the mathematics on calculating ... and estimating ... the probabilities.
> 2. Should we add some logic in code to verify collision before saving it to DB?
It really depends on the number of UUIDs you are likely to generate and store, and on the probability of collision that you are willing to accept.
However, if you are concerned by the possibility of a collision, you could just make the UUID columns a unique keys in the relevant database tables. It is more likely that a transaction will fail due to a hardware error than you will get a collision leading to a uniqueness constraint failure!
Followup questions:
> I am not sure if this probability is for one generator or multiple?
The number of generators is not relevant, provided that they are >independent< random number generators.
> As we have seen collision few hundred times with 1 million txns.
The mathematics don't lie. If you have seen a collision a few hundred times with 1 million transactions then something else is wrong. The assumptions are incorrect.
For example:
- Maybe you are using a weak PRNG.
- Maybe you are using a fixed seed or using a poor source of entropy when seeding the PRNGs.
- Maybe you are modifying (e.g. shortening) the UUIDs in a way that drastically reduces their effective bit count.
- Maybe something in your UUID generation methodology is causing UUIDs to be issued twice in a row ... sometimes.
- Maybe objects are being duplicated when they shouldn't be ... and you end up with two copies of an object with the same UUID.
- Maybe someone / something is faking UUIDs.
There are a lot of things that you need to check before you start doubting the mathematics.
> My doubt is all 4 services are using same algorithm the probability will increase.
As I said, the number of generators does not alter the mathematics.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论