重新在put方法内部对哈希映射进行哈希处理

huangapple go评论89阅读模式
英文:

Re-hashing a hash map inside put method

问题

以下是你提供的代码部分的翻译:

我正在尝试在Java中实现一个分离链接哈希映射在put()方法内部如果负载因子元素数量/数组大小变得过大我希望重新哈希映射为此我编写了另一个rehash()方法通过将数组/容量的大小加倍然后重新添加所有条目来重新哈希映射列表至少我希望是这样)。问题是当我测试时我会得到一个"java.lang.OutOfMemoryError: Java heap space"错误我猜这是因为我在rehash()方法中也调用了put()方法问题是我真不知道如何解决这个问题我想知道是否有人可以检查我的代码并给我反馈或者给我一个关于如何继续的提示
下面是代码中的put()方法

public V put(K key, V value) {
    int idx = key.hashCode() % capacity;  // 根据哈希码计算索引。
    if (idx < 0) {
        idx += this.capacity;  // 如果索引小于0,则加上数组表的长度。
    }
    if (table[idx] == null) {   // 如果索引处的链表为空,只需添加Entry节点。
        table[idx] = new Entry<K, V>(key, value);
        nr_of_keys += 1;
        if (this.load() >= this.load_factor) {  // 检查负载因子是否大于最大负载。如果是这样,重新哈希。
            rehash();
        }
        return null;
    } else {
        Entry<K, V> p = table[idx];  // 虚拟指针
        while (p.next != null) { // 当下一个节点不为空时,将指针向前移动
            if (p.getKey().equals(key)) {    // 如果键匹配:
                if (!p.getValue().equals(value)) { // 如果值不匹配,替换旧值。
                    V oldVal = p.getValue();
                    p.setValue(value);
                    return oldVal;
                }
            } else {
                p = p.next;
            }
        }
        if (p.getKey().equals(key)) {   // 如果最后一个节点的键与给定键匹配:
            if (!p.getValue().equals(value)) {
                V oldVal = p.getValue();
                p.setValue(value);
                return oldVal;
            } else {
                return null;
            }
        }
        p.next = new Entry<K, V>(key, value); // 键不存在,因此在列表末尾添加(键,值)。
        nr_of_keys += 1;
        if (this.load() >= this.load_factor) { // 如果负载过大,重新哈希()
            rehash();
        }
        return null;
    }
}

rehash()方法

public void rehash() {
    Entry<K, V>[] tmp = table;  // 创建临时表
    int old_capacity = this.capacity;  // 存储旧容量/数组的长度。
    this.capacity = 2 * capacity; // 新容量是旧容量的两倍
    this.nr_of_keys = 0; // 将键的数量重置为零。
    table = (Entry<K, V>[]) new Entry[capacity];  // 将this.table扩大为两倍
    for (int i = 0; i < old_capacity; i++) { // 遍历数组
        Entry<K, V> p = tmp[i]; // 指向位置i处链表的第一个元素。
        while (p != null) {
            put(p.getKey(), p.getValue());
            p = p.next;
        }
    }
}

load()方法

public double load() {
    return ((double) this.size()) / ((double) this.capacity);
}

其中size()返回映射中对的数量capacity是数组表的大小存储链表的地方)。

如果你有任何问题或需要进一步帮助,请随时提问。

英文:

I'm trying to implement a separate-chaining hash map in Java. Inside the put()-method I want to re-hash the map if the load factor( nr-of-elements/size-of-array) gets to large. For this I have written another method rehash() that rehashes the list by doubling the size of the array/capacity and then adding all the entries again (atleast this is what I want it to do). The problem is that when I test it I get an "java.lang.OutOfMemoryError: Java heap space" and I'm guessing this is since I'm calling the put() method inside the rehash() method as well. The problem is that I don't really know how to fix this. I wonder if someone can check my code and give me feedback or give me a hint on how to proceed.
The Entry<K,V> in the code below is a nested private class in the hash map class.

Thanks in advance!

The put()-method:

public V put(K key,V value) {
int idx = key.hashCode()%capacity;  //Calculate index based on hash code.
if(idx&lt;0) {    
idx+=this.capacity;  //if index is less than 0 add the length of the array table
}
if(table[idx]==null) {   //If list at idx is empty just add the Entry-node
table[idx] = new Entry&lt;K,V&gt;(key,value);
nr_of_keys +=1;
if(this.load()&gt;=this.load_factor) {  //Check if load-factor is greater than maximum load. If this is the case rehash.
rehash();
}
return null;
} else {
Entry&lt;K,V&gt; p = table[idx];  //dummy pointer
while(p.next!=null) { //while next node isn&#39;t null move the pointer forward
if(p.getKey().equals(key)) {    //if key matches:
if(!p.getValue().equals(value)) { //if value don&#39;t match replace the old value.
V oldVal = p.getValue();
p.setValue(value);
return oldVal; 
}
} else {
p=p.next;
}
}
if(p.getKey().equals(key)) {   //if the key of the last node matches the given key:
if(!p.getValue().equals(value)) {
V oldVal = p.getValue();
p.setValue(value);
return oldVal;
} else {
return null;
}
}
p.next = new Entry&lt;K,V&gt;(key,value); //key doesn&#39;t exist so add (key,value) at the end of the list.
nr_of_keys +=1;
if(this.load()&gt;=this.load_factor) { //if load is to large rehash()
rehash();
}
return null;
}
}

Rehash()-method:

public void rehash() {
Entry&lt;K,V&gt;[] tmp = table;  //create temporary table
int old_capacity = this.capacity;  //store old capacity/length of array.
this.capacity = 2*capacity; //New capacity is twice as large
this.nr_of_keys=0; //reset nr. of keys to zero.
table = (Entry&lt;K, V&gt;[]) new Entry[capacity];  //make this.table twice as large
for(int i=0; i&lt;old_capacity;i++) { //go through the array
Entry&lt;K,V&gt; p = tmp[i]; //points to first element of list at position i.
while(p!=null) {
put(p.getKey(), p.getValue());
p=p.next;
}
}
}

The load()-method:

public double load() {
return((double) this.size())/((double)this.capacity);
}

where size() returns the number of (key,value) pairs in the map and capacity is the size of the array table (where the linked lists are stored).

答案1

得分: 1

一旦重新处理映射,一切都将不同。存储桶、条目集等都会改变。
因此。

  • 创建临时表。
  • 使用当前的获取方法正常获取值。
  • 然后根据重新处理的方式创建新的存储桶,使用新的容量,添加到表中(不要使用PUT操作)。
  • 然后用刚刚创建的表替换现有的表。确保与新表大小相关的所有值也都被更改,比如基于阈值、容量等的存储桶选择方法。

最后,使用打印语句来追踪新的存储桶以及条目在存储桶之间的移动。

英文:

Once you rehash your map nothing will be the same. The buckets the entry sets, etc.
So.

  • create your temporary table.
  • get the values normally using your current get methods.
  • then create new buckets based on rehashing to the new bucket size, with the new capacity and add to the table. (DO NOT USE PUT).
  • Then replace the existing table with the just created one. Make certain that all values pertinent to the new table size are also changed such as bucket selection methods based on threhholds, capcity, etc.

Finally use print statements to track the new buckets and the movement of items between buckets.

答案2

得分: 0

你已经添加了rehash()函数,但是load()的实现还缺失(或者在load()函数内部,size()函数)。

虽然模式看起来很清楚,并且允许猜测,在等待这个额外的信息。

你告诉我们,在负载因子达到一定点时在put函数内部重新哈希。重新哈希会将内部数组大小加倍,并再次调用put。最终导致内存耗尽。

我猜测,问题可能在于某种微妙或者不太微妙的递归发生,你执行put操作,它通过加倍内存使用进行重新哈希,然后重新进行put,从而以某种方式创建了rehash的循环。

一个可能性是存在一些内部变量用于跟踪数组状态,但没有被正确重置(例如,已占用条目的数量等)。混淆了“旧”数组数据和正在构建的新数组的数据可能是一个主要原因。

另一个可能性在于你的put实现,但是这需要逐步进行调试 - 我建议你执行这个步骤。

英文:

You have added the rehash(), but there is still the load() implemetation missing (or inside load, the size()).

The pattern looks clear though, and allows a guess, waiting for this additional info.

You tell us that when the load factor reaches a certain point inside a put, you rehash. That rehash doubles the internal array and calls put again. And in the end you have no memory.

Where, my bet would be there is some subtle or not-so-subtle recursion taking place where you put, it rehashes by doubling the memory usage, then re-puts, which somehow creates a rehashing...

A first possiblity would be that there is some internal variables tracking the array's state that are not properly reset (e.g. number of occupied entries, ...). Confusing the "old" array data with that of the new being built would a likely culprit.

Another possiblity is with your put implementation, but it would require a step by step debug - which I'd advise you to perform.

huangapple
  • 本文由 发表于 2020年10月27日 00:16:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/64540973.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定