2023年5月11日 17:02:29go评论60阅读模式

英文:

In Pouchdb /Couchdb should replication be faster through a filter or a view than using "selector" directly in replication?

问题

I'm doing some tests with PouchDb and CouchDb in order to optimize my Ionic app as much as possible. In databases with a few hundred documents, I have no problem, but in larger databases (starting from 20,000) the replication takes a while.

Currently, the way we replicate is using the selector parameter:

this.remoteDb.replicate.to(this.localDb, {
    selector: {
        "tipo": "Parte",
        "estado": 0
    }
})

The thing is, I started doing some tests thinking that if I used filters or views created on the server, the realization would be faster since these filters and views generate a series of indexes that should speed up the process of obtaining the documents. However, I found the following when replicating only those documents whose type is "Parte" and their status is 0:

Database with 31,090 documents.

Using selector:

this.remoteDb.replicate.to(this.localDb, {
    selector: {
        "tipo": "Parte",
        "estado": 0
    }
})

Returns 4 docs
Takes 4910 ms

Using a filter defined on the server:

Server filter:

function(doc,req){
    return doc && doc.tipo===req.query.tipo && doc.estado === parseInt(req.query.estado)
}

Client code:

this.remoteDb.replicate.to(this.localDb, {
    filter: 'datos/tipo',
    query_params:{
        "tipo": "Parte",
        "estado": 0
    }
})

Using a view defined on the server:

Server view:

function (doc) {
    if(doc && doc.tipo && doc.tipo==='Parte' && doc.estado === 0)
        emit(doc.tipo, 1);
}

Client code:

this.remoteDb.replicate.to(this.localDb, {
    filter: '_view',
    view: 'pruebas/parte'
})

The thing is, I'm bringing this up because I don't know if it's normal or if I'm doing something wrong. I hope you can help me.

英文:

Currently, the way we replicate is using the selector parameter:

this.remoteDb.replicate.to(this.localDb, {
&#160;&#160;&#160;&#160;selector: {
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&quot;tipo&quot;: &quot;Parte&quot;,
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&quot;estado&quot;: 0
&#160;&#160;&#160;&#160;}
})

Database with 31,090 documents.

Using selector:

this.remoteDb.replicate.to(this.localDb, {
&#160;&#160;&#160;&#160;selector: {
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&quot;tipo&quot;: &quot;Parte&quot;,
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&quot;estado&quot;: 0
&#160;&#160;&#160;&#160;}
})

Returns 4 docs
Takes 4910 ms

Using a filter defined on the server:

Server filter:

function(doc,req){
&#160;&#160;&#160;&#160;return doc &amp;&amp; doc.tipo===req.query.tipo &amp;&amp; doc.estado === parseInt(req.query.estado)
}

Client code:

this.remoteDb.replicate.to(this.localDb, {
&#160;&#160;&#160;&#160;filter: &#39;datos/tipo&#39;,
&#160;&#160;&#160;&#160;query_params:{
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&quot;tipo&quot;: &quot;Parte&quot;,
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&quot;estado&quot;: 0
&#160;&#160;&#160;&#160;}
})

Using a view defined on the server:

Server view:

function (doc) {
&#160;&#160;&#160;&#160;if(doc &amp;&amp; doc.tipo &amp;&amp; doc.tipo===&#39;Parte&#39; &amp;&amp; doc.estado === 0)
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;emit(doc.tipo, 1);
}

Client code:

this.remoteDb.replicate.to(this.localDb, {
&#160;&#160;&#160;&#160;filter: &#39;_view&#39;,
&#160;&#160;&#160;&#160;view: &#39;pruebas/parte&#39;
})

The thing is, I'm bringing this up because I don't know if it's normal or if I'm doing something wrong. I hope you can help me.

答案1

得分: 2

首先，使用“selector”语法与JavaScript函数有很大区别：选择器在过滤CouchDB变更源时速度更快。简单来说，所有工作都可以在Erlang内部完成，无需启动任何JavaScript进程来决定是否应传递更改。

很明显，这种用例在数据库越大时会变得越慢。如果你正在“同步”大数据库的一个非常小的子集，那么CouchDB必须浏览整个变更源（数据库的历史记录）来找到你需要的少量文档。对于非常小的数据库来说这是可以接受的，但对于20k个文档来说可能有点慢，而且随着数据库不断增长，这个解决方案的可扩展性会变差。想象一下有500M个文档的数据库呢？

我为一位客户解决了这个问题，选择了一种不同的技术来在新的空白PouchDB数据库中播种数据：首先使用查询填充数据，然后复制以赶上任何最近的更改。对于更大的数据库，这样做要快得多。这里有详细说明：https://blog.cloudant.com/2019/06/21/Replicating-from-a-Query.html。

简而言之：

为所需的数据查询远程数据库。此查询应由次要索引支持，以便快速和可扩展。
使用CouchDB的 _bulk_get 端点获取所需文档的文档主体，该端点返回每个文档的复制历史记录。
将这些文档写入PouchDB

^ 这样你就可以更快地获得与复制相同的数据。

然后，你可以从远程数据库进行复制，但提供一个“since=now”参数以获取最新的更改，或“since=<一个已知的序列令牌>”。

英文:

First of all using "selector" syntax vs JavaScript functions: selectors are much faster at filtering a CouchDB changes feed. Put simply, all of the work can be done inside of Erlang, without spinning up any JavaScript processes to decide whether a change should make it through.

It should be clear that this use-case is going to get slower the bigger the database. If you are "syncing" a very small subset of a large database, then CouchDB is having to spool through the entire changes feed (the history of the database) to find the handful of documents you need. This is find for very small databases, but slowish for 20k docs and progressively slower. If you intend to keep growing the database, then this solution isn't going to scale well. Imagine a database with 500M docs?!

I solved this for a customer by choosing a different technique for seeding the data in a new empty PouchDB database: populating the data with a query first, then replicating to catch up any recent changes. This is much faster with larger databases. It's written up here: https://blog.cloudant.com/2019/06/21/Replicating-from-a-Query.html.

In short:

query the remote database for the subset of data you need. This query should be backed by a secondary index to be quick and scalable.
fetch the document bodies of the documents you need using CouchDB's _bulk_get endpoint, which returns replication history for each document.
write these documents to PouchDB

^ this gives you the same data you would have had with replication but quicker.

You can then replicate from your remote database but providing a "since=now" parameter to get latest changes, or "since=<a know sequence token>".

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

In Pouchdb /Couchdb should replication be faster through a filter or a view than using "selector" directly in replication?

问题

答案1

发送/接收新消息会将我带到聊天应用程序的顶部。

[ios][Cocoapods] Podfile平台版本未在cordova准备中更新

Couchdb _replicate 终端不接受端口 443。

登录CouchDB后重新启动时出现问题。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论