英文:
Mongo DB / Python - Search DB for string but limit results to 1 of each item based on specified field
问题
我正在尝试搜索我的MongoDB产品。数据集中有多个相同的产品以记录随时间变化的价格。我想搜索一个短语,然后将结果限制为每个UPC的一个。我的当前代码效果不错,但会返回多个相同UPC。
当前代码,将返回多个相同UPC:
response = self.DB.find({'$text': {'$search': f'/{search}/'}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)
示例数据集:
{
"_id": {
"$oid": "64cf05707844ef1a25ee57fa"
},
"upc": "032622013625",
"name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
"salePrice": 29.99,
"timestamp": "2023-08-05 22:29:04 EDT-0400"
}
{
"_id": {
"$oid": "64cf057c7844ef1a25ee57fd"
},
"upc": "048894970887",
"name": "Basic Window Fan - Holmes",
"salePrice": 54.99,
"available": false,
"timestamp": "2023-08-05 22:29:16 EDT-0400"
}
{
"_id": {
"$oid": "64cf05707844ef1a25ee57fa"
},
"upc": "032622013625",
"name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
"salePrice": 29.97,
"timestamp": "2023-08-04 13:25:09 EDT-0400"
}
不确定是否应该使用distinct
还是find
?
英文:
I am trying to search my MongoDB of products. The dataset has multiple of each product to record price over time. I would like to search for a phrase then limit the results to 1 of each UPC. My current code works well but will return multiple of the same UPC.
Current Code, will return multiple of the same UPC:
response = self.DB.find({'$text': {'$search': f'/{search}/'}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)
Example Data Set:
{
"_id": {
"$oid": "64cf05707844ef1a25ee57fa"
},
"upc": "032622013625",
"name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
"salePrice": 29.99,
"timestamp": "2023-08-05 22:29:04 EDT-0400",
}
}
{
"_id": {
"$oid": "64cf057c7844ef1a25ee57fd"
},
"upc": "048894970887",
"name": "Basic Window Fan - Holmes",
"salePrice": 54.99,
"available": false,
"timestamp": "2023-08-05 22:29:16 EDT-0400",
}
}
}
{
"_id": {
"$oid": "64cf05707844ef1a25ee57fa"
},
"upc": "032622013625",
"name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
"salePrice": 29.97,
"timestamp": "2023-08-04 13:25:09 EDT-0400",
}
}
Not sure if I should be using distinct, or find?
答案1
得分: 1
你可以在聚合管道中使用"$top"
与"$group"
来获取你的结果。如果你只想返回特定的字段,你可以使用"$project"
阶段。以下是示例代码:
response = self.DB.aggregate([
{
"$match": {'$text': {'$search': f'/{search}/'}}
},
{
"$group": {
"_id": "$upc",
"mostRecent": {
"$top": {
"sortBy": {
"timestamp": -1
},
"output": "$$ROOT"
}
}
}
},
{
"$replaceWith": "$mostRecent"
}
])
希望这能帮助你。
英文:
You could use "$top"
with "$group"
in an aggregation pipeline to get your result. If you only want certain fields returned, you could use a "$project"
stage.
response = self.DB.aggregate([
{
"$match": {'$text': {'$search': f'/{search}/'}}
},
{
"$group": {
"_id": "$upc",
"mostRecent": {
"$top": {
"sortBy": {
"timestamp": -1
},
"output": "$$ROOT"
}
}
}
},
{
"$replaceWith": "$mostRecent"
}
])
答案2
得分: 0
我使用存储UPC ID的字典过滤了结果数组,并将其附加到文档列表中,如果UPC ID不存在。
import pymongo
myclient = pymongo.MongoClient("mongodb://mongoadmin:ansible@localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["product"]
mycol.drop()
data = [
{
"_id": {
"oid": "64cf05707844ef1a25ee57fa"
},
"upc": "032622013625",
"name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
"salePrice": 29.99,
"timestamp": "2023-08-05 22:29:04 EDT-0400"
},
{
"_id": {
"oid": "64cf057c7844ef1a25ee57fd"
},
"upc": "048894970887",
"name": "Basic Window Fan - Holmes",
"salePrice": 54.99,
"available": False,
"timestamp": "2023-08-05 22:29:16 EDT-0400"
},
{
"_id": {
"oid": "64cf05707844ef1a25ee57fb"
},
"upc": "032622013625",
"name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
"salePrice": 29.97,
"timestamp": "2023-08-04 13:25:09 EDT-0400"
}
]
for d in data:
x = mycol.insert_one(d)
resp = mycol.create_index(
[
("upc", "text")
]
)
print(resp)
search = "032622013625"
response = mycol.find({"$text": {"$search": f"/{search}/"}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)
list_upc_already_seen = []
list_documents = []
for doc in response:
upc = doc.get("upc")
if upc not in list_upc_already_seen:
list_documents.append(doc)
list_upc_already_seen.append(upc)
print(list_documents)
[{ 'upc': '032622013625', 'name': 'Luigi Bormioli Michelangelo Beverage 20oz Set of 4', 'salePrice': 29.99, 'timestamp': '2023-08-05 22:29:04 EDT-0400' }]
英文:
I filtered the array of results by using a dictionary storing the upc id, and append into a list of documents if upc id is not existing.
import pymongo
myclient = pymongo.MongoClient("mongodb://mongoadmin:ansible@localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["product"]
mycol.drop()
data=[
{
"_id": {
"oid": "64cf05707844ef1a25ee57fa"
},
"upc": "032622013625",
"name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
"salePrice": 29.99,
"timestamp": "2023-08-05 22:29:04 EDT-0400"
},
{
"_id": {
"oid": "64cf057c7844ef1a25ee57fd"
},
"upc": "048894970887",
"name": "Basic Window Fan - Holmes",
"salePrice": 54.99,
"available": False,
"timestamp": "2023-08-05 22:29:16 EDT-0400"
},
{
"_id": {
"oid": "64cf05707844ef1a25ee57fb"
},
"upc": "032622013625",
"name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
"salePrice": 29.97,
"timestamp": "2023-08-04 13:25:09 EDT-0400"
}
]
for d in data:
x = mycol.insert_one(d)
resp=mycol.create_index(
[
("upc", "text")
]
)
print(resp)
search="032622013625"
response = mycol.find( { "$text": { "$search": f"/{search}/"}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)
list_upc_already_seen=[]
list_documents=[]
for doc in response:
upc=doc.get("upc")
if upc not in list_upc_already_seen:
list_documents.append(doc)
list_upc_already_seen.append(upc)
print(list_documents)
[{'upc': '032622013625', 'name': 'Luigi Bormioli Michelangelo Beverage 20oz Set of 4', 'salePrice': 29.99, 'timestamp': '2023-08-05 22:29:04 EDT-0400'}]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论