MongoDB $text搜索仅使用否定词条。

huangapple go评论56阅读模式
英文:

MongoDB $text search with only negated terms

问题

如何使用$text查询操作符来查找不包含一系列禁止词的文档?这些文档不需要包含任何特定内容,只是不能包含这些词。

这是一个相当常见的用例,例如用于过滤不当言论,但是MongoDB文档中陈述,没有任何解释或解决方法:

当传递一个只包含否定词的搜索字符串时,文本搜索将不会匹配任何文档。

英文:

How can the $text query operator be used to find documents not containing a list of forbidden words? The documents don't need to contain anything specific; just none of those words.

This is a pretty common use case, e.g. for profanity filtering, but the MongoDB documentation states, without any explanation or workarounds, that

> When passed a search string that only contains negated words, text search will not match any documents.

答案1

得分: 3

没有支持此功能的MongoDB,我猜所有的解决方案都将是hack。

这是我的解决方案:

我会在我的集合中创建一个虚拟字段,具有相同的静态值,像这样:"dummy":"x"。然后将此字段添加到文本索引中。最后,在查询中添加这个虚拟值x以克服以下限制:

当传递一个只包含否定词的搜索字符串时,文本搜索将不匹配任何文档

db.articles.insert(
    [
        { _id: 1, subject: "coffee", dummy: "x" },
        { _id: 2, subject: "Coffee Shopping", dummy: "x" },
        { _id: 3, subject: "Baking a cake", dummy: "x" },
        { _id: 4, subject: "baking", dummy: "x" },
        { _id: 5, subject: "Cafe Con Cake", dummy: "x" },
        { _id: 6, subject: "ice cream", dummy: "x" },
        { _id: 7, subject: "coffee and cream", dummy: "x" }
    ]
)

我们将虚拟字段添加到文本索引中。

db.articles.createIndex( { subject: "text", dummy:"text" } )

我们在查询中添加了x

db.articles.find( { $text: { $search: "x -cream -cake" } } ).projection({"dummy":0})

结果将如下,不包含禁止词creamcake

{
	"_id" : 4,
	"subject" : "baking"
},

{
	"_id" : 2,
	"subject" : "Coffee Shopping"
},

{
	"_id" : 1,
	"subject" : "coffee"
}
英文:

Without MongoDB doesn't support this feature, I guess all the solutions will be hack.

And here is mine:

I would create a dummy field to my collection with the same static value, like "dummy":"x". And add this field to the text index. And lastly adding this dummy value x to the query to overcome the limitation of:

> When passed a search string that only contains negated words, text
> search will not match any documents

db.articles.insert(
    [
        { _id: 1, subject: "coffee", dummy: "x" },
        { _id: 2, subject: "Coffee Shopping", dummy: "x" },
        { _id: 3, subject: "Baking a cake", dummy: "x" },
        { _id: 4, subject: "baking", dummy: "x" },
        { _id: 5, subject: "Cafe Con Cake", dummy: "x" },
        { _id: 6, subject: "ice cream", dummy: "x" },
        { _id: 7, subject: "coffee and cream", dummy: "x" }
    ]
)

We are adding dummy field to the text index.

db.articles.createIndex( { subject: "text", dummy:"text" } )

We are adding x to the query:

db.articles.find( { $text: { $search: "x -cream -cake" } } ).projection({"dummy":0})

The result will be like this without the forbidden words cream and cake:

{
	"_id" : 4,
	"subject" : "baking"
},

{
	"_id" : 2,
	"subject" : "Coffee Shopping"
},

{
	"_id" : 1,
	"subject" : "coffee"
}

答案2

得分: 1

$text 操作符需要至少一个包含词来匹配。之后,您可以在包含词之后添加任意多个禁止词,如下所示:

db.articles.find({
    $text: {
        $search: "coffee -cream -shop"
    }
})

我猜这是 MongoDB 文本搜索引擎的限制。

所以,另一种方法是这样做:

db.articles.find(
    {
        subject: {
            $not: {
                $in: [/cream/i, /shop/i]
            }
        }
    }
)
英文:

the $text operator requires at least one inclusive word to match. you can then have as many forbidden words as you like after the inclusion like so:

db.articles.find({
    $text: {
        $search: "coffee -cream -shop"
    }
})

i guess it's a limitation of mongodb's text search engine.

so, the alternative would be to do this:

db.articles.find(
    {
        subject: {
            $not: {
                $in: [/cream/i, /shop/i]
            }
        }
    }
)

huangapple
  • 本文由 发表于 2020年1月3日 14:23:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574062.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定