2023年8月10日 21:40:47go评论158阅读模式

英文:

Mongo DB / Python - Search DB for string but limit results to 1 of each item based on specified field

问题

我正在尝试搜索我的MongoDB产品。数据集中有多个相同的产品以记录随时间变化的价格。我想搜索一个短语，然后将结果限制为每个UPC的一个。我的当前代码效果不错，但会返回多个相同UPC。

当前代码，将返回多个相同UPC：

response = self.DB.find({'$text': {'$search': f'/{search}/'}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)

示例数据集：

{
  "_id": {
    "$oid": "64cf05707844ef1a25ee57fa"
  },
  "upc": "032622013625",
  "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  "salePrice": 29.99,
  "timestamp": "2023-08-05 22:29:04 EDT-0400"
}
{
  "_id": {
    "$oid": "64cf057c7844ef1a25ee57fd"
  },
  "upc": "048894970887",
  "name": "Basic Window Fan - Holmes",
  "salePrice": 54.99,
  "available": false,
  "timestamp": "2023-08-05 22:29:16 EDT-0400"
}
{
  "_id": {
    "$oid": "64cf05707844ef1a25ee57fa"
  },
  "upc": "032622013625",
  "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  "salePrice": 29.97,
  "timestamp": "2023-08-04 13:25:09 EDT-0400"
}

不确定是否应该使用distinct还是find？

英文:

I am trying to search my MongoDB of products. The dataset has multiple of each product to record price over time. I would like to search for a phrase then limit the results to 1 of each UPC. My current code works well but will return multiple of the same UPC.

Current Code, will return multiple of the same UPC:

response = self.DB.find({&#39;$text&#39;: {&#39;$search&#39;: f&#39;/{search}/&#39;}}, {&#39;Response&#39;: 0, &#39;_id&#39;: 0}).sort(&quot;timestamp&quot;, -1)

Example Data Set:

{
  &quot;_id&quot;: {
    &quot;$oid&quot;: &quot;64cf05707844ef1a25ee57fa&quot;
  },
  &quot;upc&quot;: &quot;032622013625&quot;,
  &quot;name&quot;: &quot;Luigi Bormioli Michelangelo Beverage 20oz Set of 4&quot;,
  &quot;salePrice&quot;: 29.99,
  &quot;timestamp&quot;: &quot;2023-08-05 22:29:04 EDT-0400&quot;,
  }
}
{
  &quot;_id&quot;: {
    &quot;$oid&quot;: &quot;64cf057c7844ef1a25ee57fd&quot;
  },
  &quot;upc&quot;: &quot;048894970887&quot;,
  &quot;name&quot;: &quot;Basic Window Fan - Holmes&quot;,
  &quot;salePrice&quot;: 54.99,
  &quot;available&quot;: false,
  &quot;timestamp&quot;: &quot;2023-08-05 22:29:16 EDT-0400&quot;,
 
    }
  }
}
{
  &quot;_id&quot;: {
    &quot;$oid&quot;: &quot;64cf05707844ef1a25ee57fa&quot;
  },
  &quot;upc&quot;: &quot;032622013625&quot;,
  &quot;name&quot;: &quot;Luigi Bormioli Michelangelo Beverage 20oz Set of 4&quot;,
  &quot;salePrice&quot;: 29.97,
  &quot;timestamp&quot;: &quot;2023-08-04 13:25:09 EDT-0400&quot;,
  }
}

Not sure if I should be using distinct, or find?

答案1

得分: 1

你可以在聚合管道中使用"$top"与"$group"来获取你的结果。如果你只想返回特定的字段，你可以使用"$project"阶段。以下是示例代码：

response = self.DB.aggregate([
  {
    &quot;$match&quot;: {&#39;$text&#39;: {&#39;$search&#39;: f&#39;/{search}/&#39;}}
  },
  {
    &quot;$group&quot;: {
      &quot;_id&quot;: &quot;$upc&quot;,
      &quot;mostRecent&quot;: {
        &quot;$top&quot;: {
          &quot;sortBy&quot;: {
            &quot;timestamp&quot;: -1
          },
          &quot;output&quot;: &quot;$$ROOT&quot;
        }
      }
    }
  },
  {
    &quot;$replaceWith&quot;: &quot;$mostRecent&quot;
  }
])

希望这能帮助你。

英文:

You could use "$top" with "$group" in an aggregation pipeline to get your result. If you only want certain fields returned, you could use a "$project" stage.

response = self.DB.aggregate([
  {
    &quot;$match&quot;: {&#39;$text&#39;: {&#39;$search&#39;: f&#39;/{search}/&#39;}}
  },
  {
    &quot;$group&quot;: {
      &quot;_id&quot;: &quot;$upc&quot;,
      &quot;mostRecent&quot;: {
        &quot;$top&quot;: {
          &quot;sortBy&quot;: {
            &quot;timestamp&quot;: -1
          },
          &quot;output&quot;: &quot;$$ROOT&quot;
        }
      }
    }
  },
  {
    &quot;$replaceWith&quot;: &quot;$mostRecent&quot;
  }
])

答案2

得分: 0

我使用存储UPC ID的字典过滤了结果数组，并将其附加到文档列表中，如果UPC ID不存在。

import pymongo
myclient = pymongo.MongoClient("mongodb://mongoadmin:ansible@localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["product"]
mycol.drop()
data = [
    {
        "_id": {
            "oid": "64cf05707844ef1a25ee57fa"
        },
        "upc": "032622013625",
        "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
        "salePrice": 29.99,
        "timestamp": "2023-08-05 22:29:04 EDT-0400"
    },
    {
        "_id": {
            "oid": "64cf057c7844ef1a25ee57fd"
        },
        "upc": "048894970887",
        "name": "Basic Window Fan - Holmes",
        "salePrice": 54.99,
        "available": False,
        "timestamp": "2023-08-05 22:29:16 EDT-0400"
    },
    {
        "_id": {
            "oid": "64cf05707844ef1a25ee57fb"
        },
        "upc": "032622013625",
        "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
        "salePrice": 29.97,
        "timestamp": "2023-08-04 13:25:09 EDT-0400"
    }
]
for d in data:
    x = mycol.insert_one(d)
resp = mycol.create_index(
    [
        ("upc", "text")
    ]
)
print(resp)
search = "032622013625"
response = mycol.find({"$text": {"$search": f"/{search}/"}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)
list_upc_already_seen = []
list_documents = []
for doc in response:
    upc = doc.get("upc")
    if upc not in list_upc_already_seen:
        list_documents.append(doc)
        list_upc_already_seen.append(upc)
print(list_documents)

[{ 'upc': '032622013625', 'name': 'Luigi Bormioli Michelangelo Beverage 20oz Set of 4', 'salePrice': 29.99, 'timestamp': '2023-08-05 22:29:04 EDT-0400' }]

英文:

I filtered the array of results by using a dictionary storing the upc id, and append into a list of documents if upc id is not existing.

import pymongo
myclient = pymongo.MongoClient(&quot;mongodb://mongoadmin:ansible@localhost:27017/&quot;)
mydb = myclient[&quot;mydatabase&quot;]
mycol = mydb[&quot;product&quot;]
mycol.drop()
 
data=[
 {
  &quot;_id&quot;: {
    &quot;oid&quot;: &quot;64cf05707844ef1a25ee57fa&quot;
  },
  &quot;upc&quot;: &quot;032622013625&quot;,
  &quot;name&quot;: &quot;Luigi Bormioli Michelangelo Beverage 20oz Set of 4&quot;,
  &quot;salePrice&quot;: 29.99,
  &quot;timestamp&quot;: &quot;2023-08-05 22:29:04 EDT-0400&quot;
 },
 {
  &quot;_id&quot;: {
    &quot;oid&quot;: &quot;64cf057c7844ef1a25ee57fd&quot;
  },
  &quot;upc&quot;: &quot;048894970887&quot;,
  &quot;name&quot;: &quot;Basic Window Fan - Holmes&quot;,
  &quot;salePrice&quot;: 54.99,
  &quot;available&quot;: False,
  &quot;timestamp&quot;: &quot;2023-08-05 22:29:16 EDT-0400&quot;
 
},
{
  &quot;_id&quot;: {
    &quot;oid&quot;: &quot;64cf05707844ef1a25ee57fb&quot;
  },
  &quot;upc&quot;: &quot;032622013625&quot;,
  &quot;name&quot;: &quot;Luigi Bormioli Michelangelo Beverage 20oz Set of 4&quot;,
  &quot;salePrice&quot;: 29.97,
  &quot;timestamp&quot;: &quot;2023-08-04 13:25:09 EDT-0400&quot;
}
]
for d in data:   
    x = mycol.insert_one(d)
    
resp=mycol.create_index(
    [
         (&quot;upc&quot;, &quot;text&quot;)
    ]
)
print(resp)   
    
search=&quot;032622013625&quot;
response = mycol.find( { &quot;$text&quot;: { &quot;$search&quot;: f&quot;/{search}/&quot;}}, {&#39;Response&#39;: 0, &#39;_id&#39;: 0}).sort(&quot;timestamp&quot;, -1)
list_upc_already_seen=[]
list_documents=[]
for doc in response:
    upc=doc.get(&quot;upc&quot;)
    if upc not in list_upc_already_seen:
        list_documents.append(doc)
        list_upc_already_seen.append(upc)
print(list_documents)

[{'upc': '032622013625', 'name': 'Luigi Bormioli Michelangelo Beverage 20oz Set of 4', 'salePrice': 29.99, 'timestamp': '2023-08-05 22:29:04 EDT-0400'}]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Mongo DB / Python – Search DB for string but limit results to 1 of each item based on specified field

问题

答案1

答案2

ImportError: 无法从’sklearn.neighbors._base’导入’_check_weights’

在MongoDB中只有在字段不存在时才插入字段的任何方法？

点击复选框的方法：通过 `driver.find_element` 在 Python 中找到元素的 ID。

Document AI – 将normalized_vertices转换为文档的原始比例

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。