英文:
What data structure could be used to store objects with multiple comparable attributes
问题
我想构建一个数据结构来存储多个房屋的信息,然后用户可以通过搜索查询检索到理想的房屋信息。为了实现快速搜索,我将使用红黑树。我面临的问题是,每个节点的键只包含房屋的一个属性,例如价格,至于其他属性,如床的数量,土地面积等,它们不能被存储在一棵单独的树中。对于这个问题,什么样的数据结构会比较好呢?起初我考虑在树内嵌套另一棵树,这样行得通吗?或者被认为是一个好的做法呢?
英文:
I want to build a data structure to store the information of multiple houses, and later user can retrieve desirable housing information through a search query. In order to achieve a fast search, I will use red black tree. The problem I am facing is that the key of each node only contains one attribute of the house i.e. price, as for the others such as number of beds, land size etc they can not be stored in a single tree. What would be a good data structure for this problem, initially I thought a tree nested in a tree, is this viable or considered good?
答案1
得分: 2
你所面临的问题可以通过在现有数据之上使用二级索引来解决。在数据库领域,二级索引是一个深受研究的概念,你应该可以轻松找到资源来帮助你理解它们在实际数据库中的实现方式。
因此,你目前针对你的数据已经有了一个主键:对象内存引用,或者可能是指向引用集合的索引。对于每个你想要查询的属性,你都需要有一种快速查找匹配对象的方式。你所使用的确切数据结构将取决于你执行的查询类型,但某种类型的搜索树通常是一个很好的通用数据结构,并且通常对于更新是高效的,这对于许多数据库来说非常重要。你的数据结构应该接收与特定属性相关的查询,并返回与该查询匹配的所有对象的引用或主键。
在你的示例中,你可以为价格和卧室数量分别使用一颗红黑树。如果你正在回答一个查询,条件是“价格=30或卧室数量=4”,那么你只需要查询你的价格数据结构,然后查询你的卧室数量数据结构,然后由于查询中有一个“或”,你只需取得从数据结构返回的主键的并集(对于“与”取交集)。
请注意,如果你添加或更新了对象,那么你还需要更新所有发生变化的索引。这也是实际数据库中的一种权衡;快速读取换取较慢的写入。
嵌套树方法在某些查询类型下可能有效,但如果数据结构不是静态的,它很快就会变得不适用 - 如果更新对象,则更新树的速度会非常慢。
英文:
The problem you are facing can be solved using secondary indexes on top of your data. Secondary indexes are a concept studied intensely in the database world and you should have no trouble finding resources to help you understand how they are implemented in real databases.
So, you currently have a primary key for your data: the objects memory reference or maybe an index into a collection of references. For each attribute that you want to query you will need to have a fast way of looking up matching objects. The exact data structure you use will depend on the type of queries you perform but some kind of search tree will be a good general purpose data structure and will usually be efficient for updates which is very important for a lot of databases. Your data structure should take in a query relating to the specific attribute and return references, or primary keys, to all the objects which match that query.
In your example you might have one red-black tree for price and another for number-of-beds. If you are answering a query for "price = 30 or number-of-beds = 4" then all you need to do is query your price data structure and then your number-of-beds data structure and then since you have an "or" in your query you simply take the union of the primary keys returned from your data structures (take the intersection for "and"s).
Notice that if you add to or update your objects then you will also need to update all the indexes that change. This is a trade-off you also see in real databases; faster reads for slower writes.
A nested tree approach might work depending on what kind of queries you are making but will quickly become unsuitable if the data structure is not static - it will be very slow to update the tree if you update your objects.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论