英文:
Multi parameter ranking using firebase
问题
如何利用Firebase开发类似于Twitter的混合器算法,该算法基于Firestore中的weight
和created_at
参数检索和排名讨论?
我有一个以下结构的讨论集合:
interface Discussion {
weight: number;
created_at: ServerTimeStamp;
}
挑战:
在Firestore中,按单个字段排序数据会带来一些限制。例如,如果我们仅按weight
排序讨论,新帖子将永远没有机会在排名中上升。
如果我尝试分别按weight
和created_at
排序讨论,如何有效处理去重
?
需要考虑到讨论文档的数量可能从0到100万不等。因此,我希望找到一种避免在客户端加载所有文档的解决方案。此外,所做的任何更改必须是反应性的,并利用onSnapshot
方法进行实时更新。
示例场景:
interface Discussion {
weight: number;
created_at: ServerTimeStamp;
}
async function queryDiscussionFromFireStore () {
const col_ref = collection("discussion")
// 查询热门讨论
const topPost_unSub = onSnapShot(query(col_ref, orderBy("weight"),
(snapShot) => {
setState(snapShort.docs.map (d => d.data() as Array<Discussion>)
})
// 查询最近的讨论
const recentPost_unSub = onSnapShot(query(col_ref, orderBy("created_at"),
(snapShot) => {
setState(snapShort.docs.map (d => d.data() as Array<Discussion>)
})
return () => {
recentPost_unSub()
topPost_unSub()
};
}
queryDiscussionFromFireStore
函数工作正常,但我无法弄清楚如何处理重复数据。
假设我们有以下数据:
[
{
weight: 5,
created_at: today_date
},
{
weight: 3,
created_at: today_date
},
]
在这种情况下,两个snapShot
将返回相同的数据。
解释
在提供的代码示例中,queryDiscussionFromFirestore
函数通过两个标准(权重和创建时间)对Firestore中的讨论进行检索,并使用onSnapshot
方法监听查询讨论的实时更新。
但是,存在关于重复数据的担忧。在给定的场景中,如果多个讨论具有相同的created_at
时间戳,那么“热门讨论”查询(按权重排序)和“最近讨论”查询(按创建时间排序)可能返回相同的数据。
例如,考虑以下示例数据:
[
{
weight: 5,
created_at: today_date
},
{
weight: 3,
created_at: today_date
},
]
在这种情况下,“热门讨论”和“最近讨论”的两个onSnapshot
回调将接收相同的数据,导致重复条目被处理。
英文:
How can I leverage Firebase to develop a mixer algorithm similar to Twitter, which retrieves and ranks discussions from Firestore based on weight
and created_at
parameters?
I have a discussion collection with the following structure:
interface Discussion {
weight: number;
created_at: ServerTimeStamp;
}
Challenge:
In Firestore, ordering data by a single field poses a limitation. For example, if we order discussions solely by weight
, new posts will never have the opportunity to rise up in the ranking.
If I attempt to order discussions separately by weight
and created_at
, how can I handle deduplication
effectively?
It's important to consider that the discussion documents can vary from 0 to 1 million
. Therefore, I prefer a solution that avoids loading all the documents on the client side. Additionally, any changes made must be reactive and utilize the onSnapshot
method for real-time updates.
Example Scenario:
interface Discussion {
weight: number;
created_at: ServerTimeStamp;
}
async function queryDiscussionFromFireStore () {
const col_ref = collection("discussion")
// query top discussions
const topPost_unSub = onSnapShot(query(col_ref, orderby("weight"),
(snapShot) => {
setState(snapShort.doc.map (d => d.data() as Array<Discussion>)
})
// query recent discussions
const recentPost_unSub = onSnapShot(query(col_ref, orderby("created_at"),
(snapShot) => {
setState(snapShort.doc.map (d => d.data() as Array<Discussion>)
})
return () => {
recentPost_unSub()
topPost_unSub()
};
}
queryDiscussionFromFireStore
is working fine but i'm not able to figure out how to handle duplicate data.
let suppose we have following data:
[
{
weight: 5,
created_at: today_date
},
{
weight: 3,
created_at: today_date
},
]
In this case both snapShot
will response with same data.
Explanation
In the provided code example, the queryDiscussionFromFirestore
function retrieves discussions from Firestore by ordering them based on two criteria: weight and created_at. The function uses the onSnapshot
method to listen for real-time updates on the queried discussions.
However, there is a concern regarding duplicate data. In the given scenario, if multiple discussions have the same created_at
timestamp, both the "top discussions" query (ordered by weight) and the "recent discussions" query (ordered by creation time) may return the same data.
For instance, considering the following example data:
[
{
weight: 5,
created_at: today_date
},
{
weight: 3,
created_at: today_date
},
]
In this case, both onSnapshot
callbacks for the "top discussions" and "recent discussions" queries will receive the same data, which results in duplicate entries being processed.
答案1
得分: 3
根据Firestore文档中的查询限制:
在复合查询中,范围(<、<=、>、>=)和不等于(!=、not-in)比较必须都在同一字段上进行筛选。
因此,每个查询只能在单个字段上具有范围过滤器,无法在单个查询中按多个字段对顶部结果进行排序或筛选。您将需要执行多个查询并在应用程序代码中进行重复项去重。
这也意味着无法防止额外的读取。从理论上讲,您可以找到一种方法将created_at
和weight
合并为单个值/属性,以满足您的需求进行筛选,但我知道的唯一真正的示例是地理哈希(将点的纬度/经度值合并为单个字符串值,可用于筛选以查找区域内的文档),但我个人认为这里没有类似的等效方法。
英文:
From the Firestore documentation on its query limitations:
> In a compound query, range (<, <=, >, >=) and not equals (!=, not-in) comparisons must all filter on the same field.
So each query can only have range filters on a single field, and there is no way to order/filter top results on multiple fields in a single query. You will have to perform multiple queries and deduplicate the results in your application code.
That also means that there is no way to prevent the extra reads. Theoretically, you could find a way to merge the created_at
and weight
into a single value/property that you can filter on to meet your requirements, but the only real example of something like that that I know of are geohashes (which combine the lat/lon values of a point into a single string value that you can filter on to find documents in a region), and I personally don't see an equivalent here.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论