2023年7月7日 02:42:32go评论76阅读模式

英文:

Multi parameter ranking using firebase

问题

如何利用Firebase开发类似于Twitter的混合器算法，该算法基于Firestore中的weight和created_at参数检索和排名讨论？

我有一个以下结构的讨论集合：

interface Discussion {
    weight: number;
    created_at: ServerTimeStamp;
}

挑战：

在Firestore中，按单个字段排序数据会带来一些限制。例如，如果我们仅按weight排序讨论，新帖子将永远没有机会在排名中上升。

如果我尝试分别按weight和created_at排序讨论，如何有效处理去重？

需要考虑到讨论文档的数量可能从0到100万不等。因此，我希望找到一种避免在客户端加载所有文档的解决方案。此外，所做的任何更改必须是反应性的，并利用onSnapshot方法进行实时更新。

示例场景：


interface Discussion {
    weight: number;
    created_at: ServerTimeStamp;
}

async function queryDiscussionFromFireStore () { 
   const col_ref = collection("discussion")
   // 查询热门讨论
   const topPost_unSub = onSnapShot(query(col_ref, orderBy("weight"), 
   (snapShot) => {
       setState(snapShort.docs.map (d => d.data() as Array<Discussion>)    
   })
   
   // 查询最近的讨论
   const recentPost_unSub = onSnapShot(query(col_ref, orderBy("created_at"), 
   (snapShot) => {
       setState(snapShort.docs.map (d => d.data() as Array<Discussion>)    
   })

   return () => {
     recentPost_unSub()
     topPost_unSub()
   };
}

queryDiscussionFromFireStore函数工作正常，但我无法弄清楚如何处理重复数据。

假设我们有以下数据：

[
    {
        weight: 5,
        created_at: today_date
    },
    {
        weight: 3,
        created_at: today_date
    },
]

在这种情况下，两个snapShot将返回相同的数据。

解释

在提供的代码示例中，queryDiscussionFromFirestore函数通过两个标准（权重和创建时间）对Firestore中的讨论进行检索，并使用onSnapshot方法监听查询讨论的实时更新。

但是，存在关于重复数据的担忧。在给定的场景中，如果多个讨论具有相同的created_at时间戳，那么“热门讨论”查询（按权重排序）和“最近讨论”查询（按创建时间排序）可能返回相同的数据。

例如，考虑以下示例数据：

[
    {
        weight: 5,
        created_at: today_date
    },
    {
        weight: 3,
        created_at: today_date
    },
]

在这种情况下，“热门讨论”和“最近讨论”的两个onSnapshot回调将接收相同的数据，导致重复条目被处理。

英文:

How can I leverage Firebase to develop a mixer algorithm similar to Twitter, which retrieves and ranks discussions from Firestore based on weight and created_at parameters?

I have a discussion collection with the following structure:

interface Discussion {
    weight: number;
    created_at: ServerTimeStamp;
}

Challenge:

In Firestore, ordering data by a single field poses a limitation. For example, if we order discussions solely by weight, new posts will never have the opportunity to rise up in the ranking.

If I attempt to order discussions separately by weight and created_at, how can I handle deduplication effectively?

It's important to consider that the discussion documents can vary from 0 to 1 million. Therefore, I prefer a solution that avoids loading all the documents on the client side. Additionally, any changes made must be reactive and utilize the onSnapshot method for real-time updates.

Example Scenario:


interface Discussion {
    weight: number;
    created_at: ServerTimeStamp;
}

async function queryDiscussionFromFireStore () { 
   const col_ref = collection(&quot;discussion&quot;)
   // query top discussions
   const topPost_unSub = onSnapShot(query(col_ref, orderby(&quot;weight&quot;), 
   (snapShot) =&gt; {
       setState(snapShort.doc.map (d =&gt; d.data() as Array&lt;Discussion&gt;)    
   })
   
   // query recent discussions
   const recentPost_unSub = onSnapShot(query(col_ref, orderby(&quot;created_at&quot;), 
   (snapShot) =&gt; {
       setState(snapShort.doc.map (d =&gt; d.data() as Array&lt;Discussion&gt;)    
   })

   return () =&gt; {
     recentPost_unSub()
     topPost_unSub()
   };
}

queryDiscussionFromFireStore is working fine but i'm not able to figure out how to handle duplicate data.

let suppose we have following data:

[
    {
        weight: 5,
        created_at: today_date
    },
    {
        weight: 3,
        created_at: today_date
    },
]

In this case both snapShot will response with same data.

Explanation

In the provided code example, the queryDiscussionFromFirestore function retrieves discussions from Firestore by ordering them based on two criteria: weight and created_at. The function uses the onSnapshot method to listen for real-time updates on the queried discussions.

However, there is a concern regarding duplicate data. In the given scenario, if multiple discussions have the same created_at timestamp, both the "top discussions" query (ordered by weight) and the "recent discussions" query (ordered by creation time) may return the same data.

For instance, considering the following example data:

[
    {
        weight: 5,
        created_at: today_date
    },
    {
        weight: 3,
        created_at: today_date
    },
]

In this case, both onSnapshot callbacks for the "top discussions" and "recent discussions" queries will receive the same data, which results in duplicate entries being processed.

答案1

得分: 3

根据Firestore文档中的查询限制：

在复合查询中，范围（<、<=、>、>=）和不等于（!=、not-in）比较必须都在同一字段上进行筛选。

因此，每个查询只能在单个字段上具有范围过滤器，无法在单个查询中按多个字段对顶部结果进行排序或筛选。您将需要执行多个查询并在应用程序代码中进行重复项去重。

这也意味着无法防止额外的读取。从理论上讲，您可以找到一种方法将created_at和weight合并为单个值/属性，以满足您的需求进行筛选，但我知道的唯一真正的示例是地理哈希（将点的纬度/经度值合并为单个字符串值，可用于筛选以查找区域内的文档），但我个人认为这里没有类似的等效方法。

英文:

From the Firestore documentation on its query limitations:

> In a compound query, range (<, <=, >, >=) and not equals (!=, not-in) comparisons must all filter on the same field.

So each query can only have range filters on a single field, and there is no way to order/filter top results on multiple fields in a single query. You will have to perform multiple queries and deduplicate the results in your application code.

That also means that there is no way to prevent the extra reads. Theoretically, you could find a way to merge the created_at and weight into a single value/property that you can filter on to meet your requirements, but the only real example of something like that that I know of are geohashes (which combine the lat/lon values of a point into a single string value that you can filter on to find documents in a region), and I personally don't see an equivalent here.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

多参数排名使用Firebase

问题

挑战：

示例场景：

解释

Challenge:

Example Scenario:

Explanation

答案1

go appengine: panic: proto: duplicate enum registered: appengine.LogServiceError_ErrorCode

Permission denied for Firebase Realtime Database

C# Orchestration and firestore

如何编写CI/CD流水线，在Google Kubernetes集群上运行Java微服务的集成测试？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论