英文:
Efficient array lookup in C
问题
我正在尝试用C语言编写一个简单的自定义语言解释器。由于C语言的简洁性,我想使用C而不是C++。
在C语言中,我不确定如何存储变量和进行变量查找。
我原计划将变量存储在数组中,但我认为我需要一个可变大小的数组。
我还不知道除了通过循环遍历数组之外,还有没有其他高效的查找变量的方法。
所以我想知道,创建一个可变大小数组的高效方法是什么?Python、Ruby或Go语言是如何高效地存储和检索变量的?
英文:
I'm trying to write a simple language interpreter for a custom language in C. I want to use C over C++ due to C's simplicity.
The things I'm not sure how to do in C is, storing variables and variable lookups.
I was planning to store variables in an array, but I think I'd need a variable sized array.
I also don't know an efficient way to lookup variables from an array besides just looping through it.
So I'd like to know, what is an efficient way of creating a variable sized array? How does Python or Ruby or Go store and retrieve variables efficiently?
答案1
得分: 4
Python和Ruby使用哈希表(hash-tables)来存储和检索变量。变量的名称被转换为整数,并将该整数用作数组的索引。可能会出现多个名称冲突(转换为相同的整数),因此需要考虑允许在同一个槽位上有多个名称到值的绑定,但每个名称只需要检查几个绑定。
Go是编译型语言,所以变量在编译时被转换为地址(静态地址或相对于堆栈或帧指针的偏移量)。
创建可变大小数组的有效方法是使用malloc
和realloc
。
在调整哈希表的桶数组大小时,realloc
不适用,因为需要逐个重新计算旧桶数组中的所有键的哈希值,以确定它们在新数组中的位置。如果你知道解释器将解释的最大程序的大小,可以直接分配适用于最大程序的哈希表大小,避免编写哈希表调整大小的函数。
英文:
> How does Python or Ruby or Go store and retrieve variables efficiently?
Python and Ruby use hash-tables: the name of the variable is translated into an integer, and that integer is used as index into an array. It can always happen that several names collide (translate to the same integer), so that needs to be taken into account by allowing several bindings from name to value at the same slot, but there will only be a few to check for each name.
Go is compiled, so the variable is translated to an address (either static or an offset with respect to the stack—or frame—pointer) at compile-time.
> what is an efficient way of creating a variable sized array?
If you decided to do that, you would use malloc
and realloc
.
In the case of resizing the array of buckets of a hash-table, realloc
is unfortunately not useful because all the keys in the old array of buckets need to be re-hashed one by one to find where they go in the new array. If you know the maximum size of programs that will be interpreted by your interpreter, you can allocate the hash-table directly at the size that works for the largest programs, and avoid writing the hash-table resizing function.
答案2
得分: 3
我认为当你尝试自己实现变量存储时,很容易陷入其中。我建议你使用现有的哈希映射表,比如uthash,只是为了在概念上看看它的效果,并尽可能地封装它。如果后来发现它成为了瓶颈,你可以回来进行优化。
我有点自信地说,在那个时候,你不会选择动态扩展的数组。你必须考虑到你需要通过名称实现基于字符串的搜索来找到一个变量,所以很难找到比使用动态扩展数组的哈希映射表更好的方法。如果未排序,搜索复杂度为O(n),如果排序,复杂度为O(log n),而哈希映射表的搜索复杂度为O(1)。
英文:
I think you can get really carried away when trying to implement a variable-storage yourself. I would recommend you use an existing hashmap like uthash just to see how it works out for you conceptually and encapsulate it as good as possible. If it turns out to be a bottleneck, you can come back and optimize later.
I am somewhat confident to say, that at that time, you will not pick a dynamically expanding array. You have to consider that you need to implement a string-based search to find a variable by name, so you will have a hard time doing better than a hashmap with a dynamically expanding array. Search on it would be O(n) if unsorted and O(log n) if sorted, whereas the hashmap has O(1) search complexity.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论