英文:
How to add unique index to mongo collection with duplicates?
问题
我是个对MongoDB不太熟悉的新手,我想知道处理以下情况的最佳方法。
我有一个名为students的集合,其中有一个名为email的字段,该集合已经包含了一些已设置此属性的记录,我们发现有一些记录的电子邮件重复了(多个学生具有相同的电子邮件)。因此,从现在开始,我们想将电子邮件属性转换为唯一索引,所以我想知道这个过程是什么。
我应该首先删除现有的重复项,还是现在添加唯一索引就足够了,并且它将确保从现在开始,集合(数据库)不允许创建具有现有电子邮件的记录?
我的意思是,我们对现有/早期记录没有问题,但我们希望防止将来的记录发生这种情况。
英文:
I'm a bit new with mongo and I would like to know the best way to handle the following situation.
I've a students collection that has a field named email, the collection already contains some records with this property set and we've found that there are some records with the email duplicated (more than one student have the same email). So from now we would like to convert the email property to an unique index, so I was wondering what is the process for it.
Should I remove first the existent duplicates or is it enough to add the unique index now and it will ensure that from now the collection (db) doesn't allow them to create records with existent emails?
I mean, we don't have problem with the existing/earlier records, but we would like to prevent it from happening for future records.
答案1
得分: 1
要添加唯一索引,您可以使用以下代码:
db.collection.createIndex({ email: 1 }, { unique: true })
唯一索引确保索引字段不存储重复值,即强制索引字段的唯一性。默认情况下,在创建集合时,MongoDB 会在 _id 字段上创建唯一索引。
更多信息,请查看:https://www.mongodb.com/docs/manual/core/index-unique/
但在执行上述操作之前,您需要删除具有重复电子邮件值的文档:
db.dups.aggregate([
{
$group: {
_id: "$email",
dups: { $push: "$_id" },
count: { $sum: 1 }
}
},
{
$match: { count: { $gt: 1 } }
}
]).forEach(function(doc) {
doc.dups.shift();
db.dups.remove({ _id: { $in: doc.dups } });
});
有关更多信息,请参阅此答案:https://stackoverflow.com/a/35711737/1278463
英文:
To add a unique index you can use:
db.collection.createIndex( email, { unique: true } )
> A unique index ensures that the indexed fields do not store duplicate values; i.e. enforces uniqueness for the indexed fields. By default, MongoDB creates a unique index on the _id field during the creation of a collection.
https://www.mongodb.com/docs/manual/core/index-unique/
However, you have to remove documents with duplicate email values before that
db.dups.aggregate([{$group:{_id:"$email", dups:{$push:"$_id"}, count: {$sum: 1}}},
{$match:{count: {$gt: 1}}}
]).forEach(function(doc){
doc.dups.shift();
db.dups.remove({_id : {$in: doc.dups}});
});
Check this answer for more info https://stackoverflow.com/a/35711737/1278463
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论