问题

我是个对MongoDB不太熟悉的新手，我想知道处理以下情况的最佳方法。

我有一个名为students的集合，其中有一个名为email的字段，该集合已经包含了一些已设置此属性的记录，我们发现有一些记录的电子邮件重复了（多个学生具有相同的电子邮件）。因此，从现在开始，我们想将电子邮件属性转换为唯一索引，所以我想知道这个过程是什么。

我应该首先删除现有的重复项，还是现在添加唯一索引就足够了，并且它将确保从现在开始，集合（数据库）不允许创建具有现有电子邮件的记录？

我的意思是，我们对现有/早期记录没有问题，但我们希望防止将来的记录发生这种情况。

英文:

I'm a bit new with mongo and I would like to know the best way to handle the following situation.

I've a students collection that has a field named email, the collection already contains some records with this property set and we've found that there are some records with the email duplicated (more than one student have the same email). So from now we would like to convert the email property to an unique index, so I was wondering what is the process for it.

Should I remove first the existent duplicates or is it enough to add the unique index now and it will ensure that from now the collection (db) doesn't allow them to create records with existent emails?

I mean, we don't have problem with the existing/earlier records, but we would like to prevent it from happening for future records.

答案1

得分: 1

要添加唯一索引，您可以使用以下代码：

db.collection.createIndex({ email: 1 }, { unique: true })

唯一索引确保索引字段不存储重复值，即强制索引字段的唯一性。默认情况下，在创建集合时，MongoDB 会在 _id 字段上创建唯一索引。

更多信息，请查看：https://www.mongodb.com/docs/manual/core/index-unique/

但在执行上述操作之前，您需要删除具有重复电子邮件值的文档：

db.dups.aggregate([
  {
    $group: {
      _id: "$email",
      dups: { $push: "$_id" },
      count: { $sum: 1 }
    }
  },
  {
    $match: { count: { $gt: 1 } }
  }
]).forEach(function(doc) {
  doc.dups.shift();
  db.dups.remove({ _id: { $in: doc.dups } });
});

有关更多信息，请参阅此答案：https://stackoverflow.com/a/35711737/1278463

英文:

To add a unique index you can use:

db.collection.createIndex( email, { unique: true } )

> A unique index ensures that the indexed fields do not store duplicate values; i.e. enforces uniqueness for the indexed fields. By default, MongoDB creates a unique index on the _id field during the creation of a collection.

https://www.mongodb.com/docs/manual/core/index-unique/

However, you have to remove documents with duplicate email values before that

db.dups.aggregate([{$group:{_id:&quot;$email&quot;, dups:{$push:&quot;$_id&quot;}, count: {$sum: 1}}},
{$match:{count: {$gt: 1}}}
]).forEach(function(doc){
  doc.dups.shift();
  db.dups.remove({_id : {$in: doc.dups}});
});

Check this answer for more info https://stackoverflow.com/a/35711737/1278463

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何向Mongo集合添加唯一索引，如果有重复项？

问题

答案1

When i parse by req.query boolean and number to json it becomes string. Mongo .find() finds nothing

MongoDB在Go语言中的聚合查询（mgo.v2）

如何从Mongoose中的数组中获取对象信息？

使用共享会话变量从我的函数中插入 MongoDB 数据。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论