英文:
In Pouchdb /Couchdb should replication be faster through a filter or a view than using "selector" directly in replication?
问题
I'm doing some tests with PouchDb and CouchDb in order to optimize my Ionic app as much as possible. In databases with a few hundred documents, I have no problem, but in larger databases (starting from 20,000) the replication takes a while.
Currently, the way we replicate is using the selector parameter:
this.remoteDb.replicate.to(this.localDb, {
selector: {
"tipo": "Parte",
"estado": 0
}
})
The thing is, I started doing some tests thinking that if I used filters or views created on the server, the realization would be faster since these filters and views generate a series of indexes that should speed up the process of obtaining the documents. However, I found the following when replicating only those documents whose type is "Parte" and their status is 0:
Database with 31,090 documents.
Using selector:
this.remoteDb.replicate.to(this.localDb, {
selector: {
"tipo": "Parte",
"estado": 0
}
})
Returns 4 docs
Takes 4910 ms
Using a filter defined on the server:
Server filter:
function(doc,req){
return doc && doc.tipo===req.query.tipo && doc.estado === parseInt(req.query.estado)
}
Client code:
this.remoteDb.replicate.to(this.localDb, {
filter: 'datos/tipo',
query_params:{
"tipo": "Parte",
"estado": 0
}
})
Using a view defined on the server:
Server view:
function (doc) {
if(doc && doc.tipo && doc.tipo==='Parte' && doc.estado === 0)
emit(doc.tipo, 1);
}
Client code:
this.remoteDb.replicate.to(this.localDb, {
filter: '_view',
view: 'pruebas/parte'
})
The thing is, I'm bringing this up because I don't know if it's normal or if I'm doing something wrong. I hope you can help me.
英文:
I'm doing some tests with PouchDb and CouchDb in order to optimize my Ionic app as much as possible. In databases with a few hundred documents, I have no problem, but in larger databases (starting from 20,000) the replication takes a while.
Currently, the way we replicate is using the selector parameter:
this.remoteDb.replicate.to(this.localDb, {
    selector: {
        "tipo": "Parte",
        "estado": 0
    }
})
The thing is, I started doing some tests thinking that if I used filters or views created on the server, the realization would be faster since these filters and views generate a series of indexes that should speed up the process of obtaining the documents. However, I found the following when replicating only those documents whose type is "Parte" and their status is 0:
Database with 31,090 documents.
Using selector:
this.remoteDb.replicate.to(this.localDb, {
    selector: {
        "tipo": "Parte",
        "estado": 0
    }
})
Returns 4 docs
Takes 4910 ms
Using a filter defined on the server:
Server filter:
function(doc,req){
    return doc && doc.tipo===req.query.tipo && doc.estado === parseInt(req.query.estado)
}
Client code:
this.remoteDb.replicate.to(this.localDb, {
    filter: 'datos/tipo',
    query_params:{
        "tipo": "Parte",
        "estado": 0
    }
})
Using a view defined on the server:
Server view:
function (doc) {
    if(doc && doc.tipo && doc.tipo==='Parte' && doc.estado === 0)
        emit(doc.tipo, 1);
}
Client code:
this.remoteDb.replicate.to(this.localDb, {
    filter: '_view',
    view: 'pruebas/parte'
})
The thing is, I'm bringing this up because I don't know if it's normal or if I'm doing something wrong. I hope you can help me.
答案1
得分: 2
首先,使用“selector”语法与JavaScript函数有很大区别:选择器在过滤CouchDB变更源时速度更快。简单来说,所有工作都可以在Erlang内部完成,无需启动任何JavaScript进程来决定是否应传递更改。
很明显,这种用例在数据库越大时会变得越慢。如果你正在“同步”大数据库的一个非常小的子集,那么CouchDB必须浏览整个变更源(数据库的历史记录)来找到你需要的少量文档。对于非常小的数据库来说这是可以接受的,但对于20k个文档来说可能有点慢,而且随着数据库不断增长,这个解决方案的可扩展性会变差。想象一下有500M个文档的数据库呢?
我为一位客户解决了这个问题,选择了一种不同的技术来在新的空白PouchDB数据库中播种数据:首先使用查询填充数据,然后复制以赶上任何最近的更改。对于更大的数据库,这样做要快得多。这里有详细说明:https://blog.cloudant.com/2019/06/21/Replicating-from-a-Query.html。
简而言之:
- 为所需的数据查询远程数据库。此查询应由次要索引支持,以便快速和可扩展。
- 使用CouchDB的
_bulk_get
端点获取所需文档的文档主体,该端点返回每个文档的复制历史记录。 - 将这些文档写入PouchDB
^ 这样你就可以更快地获得与复制相同的数据。
然后,你可以从远程数据库进行复制,但提供一个“since=now”参数以获取最新的更改,或“since=<一个已知的序列令牌>”。
英文:
First of all using "selector" syntax vs JavaScript functions: selectors are much faster at filtering a CouchDB changes feed. Put simply, all of the work can be done inside of Erlang, without spinning up any JavaScript processes to decide whether a change should make it through.
It should be clear that this use-case is going to get slower the bigger the database. If you are "syncing" a very small subset of a large database, then CouchDB is having to spool through the entire changes feed (the history of the database) to find the handful of documents you need. This is find for very small databases, but slowish for 20k docs and progressively slower. If you intend to keep growing the database, then this solution isn't going to scale well. Imagine a database with 500M docs?!
I solved this for a customer by choosing a different technique for seeding the data in a new empty PouchDB database: populating the data with a query first, then replicating to catch up any recent changes. This is much faster with larger databases. It's written up here: https://blog.cloudant.com/2019/06/21/Replicating-from-a-Query.html.
In short:
- query the remote database for the subset of data you need. This query should be backed by a secondary index to be quick and scalable.
- fetch the document bodies of the documents you need using CouchDB's
_bulk_get
endpoint, which returns replication history for each document. - write these documents to PouchDB
^ this gives you the same data you would have had with replication but quicker.
You can then replicate from your remote database but providing a "since=now" parameter to get latest changes, or "since=<a know sequence token>".
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论