从固定人口长度生成随机唯一数字的问题

huangapple go评论161阅读模式
英文:

Problem to generate random unique numbers from fixed population length

问题

Here's the translation of the code-related portion:

以下是代码相关部分的翻译:

1轮:
人口规模:2,000,000
样本:400,000
List result = new ArrayList(sample);

我将数字保存在一个名为perm[]的变量中,所以:
int perm[] = new int[population]

public List generateRandomNumbers(int population, Set setListStringSeen, int sample)
{
for (int i = 0; i < sample; i++)
{
// 生成在 i 和 population-i 之间的随机整数
k = i + (int) (Math.random() * (population - i));

 if(setListStringSeen.contains(k))
 {    
     // 这里的问题是:当我在这里检查,如果新生成的数字
     // 已经被看到,我需要再次生成一个新数字。但在这种情况下,
     // 下一个数字还需要再次检查,因为它也可能已经被看到。
     // 如何结束这个检查循环呢?

	k = i + (int) (Math.random() * (population - i));
	
	if(setListStringSeen.contains(k))
	{
		System.out.println("我们之前已经选择过这个数字");
	}
	
	setListStringSeen.add(k);			
	
  }

  int t = perm[k];
  perm[k] = perm[i];
  perm[i] = t;

}

for (int i = 0; i < sample; i++)
{
result.add(perm[i]);
}

在1轮结束时,将生成的所有数字添加到HashSet中:

setListStringSeen.addAll(result);

return result;
}

现在让我们来到第2轮:
假设我们想生成20,000个新数字:
我想要的是,通过检查HashSet变量,检查在第二轮中将要生成的这些数字是否已经被看到。有关如何做到这一点的任何想法吗?

英文:

I have a problem here which is: I need to generate random numbers given a fixed length, and every time I generate those numbers, I need to check if it was already seen or not.

Example: my fixed population size is 2.000.000. So, for example, in the first round of my algorithm, my sample size is 400.000. I need to generate 400.000 over 2.000.000. After generating those random numbers, I save them in a HashSet.

In a second rand of my algorithm, let's say I want to generate 20.000 random numbers, but I need to check with those 20.000 numbers was already seen or not by looking at the HashSet (which contains the 400.000 initial numbers from the 1 round).

This is what I got so far:

1 round: 
         population size: 2.000.000
         sample: 400.000
	     List&lt;Integer&gt; result = new ArrayList&lt;Integer&gt;(sample);

         I save the numbers in a variable called perm[],so :
         int perm[] = new int[population]

  public List&lt;Integer&gt; generateRandomNumbers (int population, Set&lt;Integer&gt; setListStringSeen, int sample)
  {
   for (int i = 0; i &lt; sample; i++)  
   {
	  // random integer between i and population-i
	  k = i + (int) (Math.random() * (population - i));
					
	 if(setListStringSeen.contains(k))
	 {    
         // the problem here is: when I check here and if the newly generated number
         // was already see, I need to generate again a new number. But in this case,
         // the next number need to be checked again, because it could be seen too.
         // how can I end up this loop of checking?

		k = i + (int) (Math.random() * (population - i));
		
		if(setListStringSeen.contains(k))
		{
			System.out.println(&quot;we&#39;ve choose this number once before&quot;);
		}
		
		setListStringSeen.add(k);			
		
	  }
	
	  int t = perm[k];
	  perm[k] = perm[i];
	  perm[i] = t;
	
   }

   for (int i = 0; i &lt; sample; i++)
   {
	  result.add(perm[i]);
   }

    at the end of 1 round, I add all the generated numbers in a HashSet:

   setListStringSeen.addAll(result);

   return result;

}

Now let's go to the 2 round:
let's say we want to generate 20.000 new numbers:
what I want is, check if those numbers the will be generated (in the second round) was already seen before by checking the Hashset variable. Any idea on how to do it?

答案1

得分: 1

你可以使用以下代码将其添加到集合中,并确保唯一性:

while (set.add(random.nextInt(2000000)) != true);

另一种选项是在类范围内创建一个包含 200 万个元素的样本集合,然后对其进行洗牌,然后从列表中获取元素,以确保不会重复获取相同的数字:

List<Integer> sample = IntStream.rangeClosed(0, 2000000)
    .boxed().collect(Collectors.toList());
Collections.shuffle(sample);
英文:

You can use:

while (set.add(random.nextInt(2000000)) != true);

to add it to the set and it will add it uniquely

Another option could be to create a total sample set in class scope of 2mil and then shuffle it and just pull from the list so you never get the same number twice:

List&lt;Integer&gt; sample = IntStream.rangeClosed(0, 2000000)
    .boxed().collect(Collectors.toList());
Collections.shuffle(sample)

答案2

得分: 1

你应该事先生成随机数,以确保它们不会重复。

一个简单的方法是获取一个整数列表,然后对其进行洗牌。

例如:

// 从0到总体大小获取整数列表
final List<Integer> integers = Stream.iterate(0, n -> n + 1)
    .limit(population)
    .collect(Collectors.toList());
// 整数列表将包含[0, 1, 2, .... n]

// 然后对它们进行洗牌
Collections.shuffle(integers);
// 整数列表将会类似于[3, 66, 44, .... n]的随机顺序

请参考https://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#shuffle-java.util.List-java.util.Random-

英文:

You should generate the random numbers beforehand, so you are certain they are not repeated.

An easy way of doing this is to obtain a list of integers and then shuffle it.

For example:

// Obtain a list of integers from 0 to the size of population - 1
final List&lt;Integer&gt; integers = Stream.iterate(0, n -&gt; n + 1)
    .limit(population)
    .collect(Collectors.toList());
// integers will have have [0, 1, 2, .... n]

// Then shuffle them
Collections.shuffle(integers);
// integers will have have something like [3, 66, 44, .... n] randomly

Check https://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#shuffle-java.util.List-java.util.Random-

答案3

得分: 1

如果您使用的是Java 8或更高版本,您可以像下面这样做:

public static void main(String args[]) {
    Random rand = new Random();

    int populationSize = 20;
    int sampleSizeFirstRound = 10;

    Set<Integer> sample = rand.ints(1, populationSize)
            .distinct()
            .limit(sampleSizeFirstRound)
            .boxed()
            .collect(Collectors.toSet());

    int sampleSizeSecondRound = 6;
    Set<Integer> sampleSecondRound = rand.ints(1, populationSize)
            .distinct()
            .boxed()
            .filter(i -> !sample.contains(i))
            .limit(sampleSizeSecondRound)
            .collect(Collectors.toSet());

    System.out.println(sample);
    System.out.println(sampleSecondRound);
}

为了使其更易管理,我保持了样本的大小较小。根据需要进行调整。

英文:

If you are using Java 8 or higher, you could do something like below:

public static void main(String args[]) {
    Random rand = new Random();

    int populationSize =  20;
    int sampleSizeFirstRound =  10;

    Set&lt;Integer&gt; sample = rand.ints(1,populationSize)
            .distinct()
            .limit(sampleSizeFirstRound)
            .boxed()
            .collect(Collectors.toSet());

    int sampleSizeSecondRound =  6;
    Set&lt;Integer&gt; sampleSecondRound = rand.ints(1,populationSize)
            .distinct()
            .boxed()
            .filter(i -&gt; !sample.contains(i))
            .limit(sampleSizeSecondRound)
            .collect(Collectors.toSet());

    System.out.println(sample);
    System.out.println(sampleSecondRound);
}

To make it more manageable I have kept the sizes of the samples small. Adapt them as needed.

huangapple
  • 本文由 发表于 2020年7月31日 23:31:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/63194819.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定