Mongo DB / Python – Search DB for string but limit results to 1 of each item based on specified field

huangapple go评论144阅读模式
英文:

Mongo DB / Python - Search DB for string but limit results to 1 of each item based on specified field

问题

我正在尝试搜索我的MongoDB产品。数据集中有多个相同的产品以记录随时间变化的价格。我想搜索一个短语,然后将结果限制为每个UPC的一个。我的当前代码效果不错,但会返回多个相同UPC。

当前代码,将返回多个相同UPC:

  1. response = self.DB.find({'$text': {'$search': f'/{search}/'}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)

示例数据集:

  1. {
  2. "_id": {
  3. "$oid": "64cf05707844ef1a25ee57fa"
  4. },
  5. "upc": "032622013625",
  6. "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  7. "salePrice": 29.99,
  8. "timestamp": "2023-08-05 22:29:04 EDT-0400"
  9. }
  10. {
  11. "_id": {
  12. "$oid": "64cf057c7844ef1a25ee57fd"
  13. },
  14. "upc": "048894970887",
  15. "name": "Basic Window Fan - Holmes",
  16. "salePrice": 54.99,
  17. "available": false,
  18. "timestamp": "2023-08-05 22:29:16 EDT-0400"
  19. }
  20. {
  21. "_id": {
  22. "$oid": "64cf05707844ef1a25ee57fa"
  23. },
  24. "upc": "032622013625",
  25. "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  26. "salePrice": 29.97,
  27. "timestamp": "2023-08-04 13:25:09 EDT-0400"
  28. }

不确定是否应该使用distinct还是find

英文:

I am trying to search my MongoDB of products. The dataset has multiple of each product to record price over time. I would like to search for a phrase then limit the results to 1 of each UPC. My current code works well but will return multiple of the same UPC.

Current Code, will return multiple of the same UPC:

  1. response = self.DB.find({'$text': {'$search': f'/{search}/'}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)

Example Data Set:

  1. {
  2. "_id": {
  3. "$oid": "64cf05707844ef1a25ee57fa"
  4. },
  5. "upc": "032622013625",
  6. "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  7. "salePrice": 29.99,
  8. "timestamp": "2023-08-05 22:29:04 EDT-0400",
  9. }
  10. }
  11. {
  12. "_id": {
  13. "$oid": "64cf057c7844ef1a25ee57fd"
  14. },
  15. "upc": "048894970887",
  16. "name": "Basic Window Fan - Holmes",
  17. "salePrice": 54.99,
  18. "available": false,
  19. "timestamp": "2023-08-05 22:29:16 EDT-0400",
  20. }
  21. }
  22. }
  23. {
  24. "_id": {
  25. "$oid": "64cf05707844ef1a25ee57fa"
  26. },
  27. "upc": "032622013625",
  28. "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  29. "salePrice": 29.97,
  30. "timestamp": "2023-08-04 13:25:09 EDT-0400",
  31. }
  32. }

Not sure if I should be using distinct, or find?

答案1

得分: 1

你可以在聚合管道中使用"$top""$group"来获取你的结果。如果你只想返回特定的字段,你可以使用"$project"阶段。以下是示例代码:

  1. response = self.DB.aggregate([
  2. {
  3. "$match": {'$text': {'$search': f'/{search}/'}}
  4. },
  5. {
  6. "$group": {
  7. "_id": "$upc",
  8. "mostRecent": {
  9. "$top": {
  10. "sortBy": {
  11. "timestamp": -1
  12. },
  13. "output": "$$ROOT"
  14. }
  15. }
  16. }
  17. },
  18. {
  19. "$replaceWith": "$mostRecent"
  20. }
  21. ])

希望这能帮助你。

英文:

You could use "$top" with "$group" in an aggregation pipeline to get your result. If you only want certain fields returned, you could use a "$project" stage.

  1. response = self.DB.aggregate([
  2. {
  3. "$match": {'$text': {'$search': f'/{search}/'}}
  4. },
  5. {
  6. "$group": {
  7. "_id": "$upc",
  8. "mostRecent": {
  9. "$top": {
  10. "sortBy": {
  11. "timestamp": -1
  12. },
  13. "output": "$$ROOT"
  14. }
  15. }
  16. }
  17. },
  18. {
  19. "$replaceWith": "$mostRecent"
  20. }
  21. ])

答案2

得分: 0

我使用存储UPC ID的字典过滤了结果数组,并将其附加到文档列表中,如果UPC ID不存在。

  1. import pymongo
  2. myclient = pymongo.MongoClient("mongodb://mongoadmin:ansible@localhost:27017/")
  3. mydb = myclient["mydatabase"]
  4. mycol = mydb["product"]
  5. mycol.drop()
  6. data = [
  7. {
  8. "_id": {
  9. "oid": "64cf05707844ef1a25ee57fa"
  10. },
  11. "upc": "032622013625",
  12. "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  13. "salePrice": 29.99,
  14. "timestamp": "2023-08-05 22:29:04 EDT-0400"
  15. },
  16. {
  17. "_id": {
  18. "oid": "64cf057c7844ef1a25ee57fd"
  19. },
  20. "upc": "048894970887",
  21. "name": "Basic Window Fan - Holmes",
  22. "salePrice": 54.99,
  23. "available": False,
  24. "timestamp": "2023-08-05 22:29:16 EDT-0400"
  25. },
  26. {
  27. "_id": {
  28. "oid": "64cf05707844ef1a25ee57fb"
  29. },
  30. "upc": "032622013625",
  31. "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  32. "salePrice": 29.97,
  33. "timestamp": "2023-08-04 13:25:09 EDT-0400"
  34. }
  35. ]
  36. for d in data:
  37. x = mycol.insert_one(d)
  38. resp = mycol.create_index(
  39. [
  40. ("upc", "text")
  41. ]
  42. )
  43. print(resp)
  44. search = "032622013625"
  45. response = mycol.find({"$text": {"$search": f"/{search}/"}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)
  46. list_upc_already_seen = []
  47. list_documents = []
  48. for doc in response:
  49. upc = doc.get("upc")
  50. if upc not in list_upc_already_seen:
  51. list_documents.append(doc)
  52. list_upc_already_seen.append(upc)
  53. print(list_documents)

[{ 'upc': '032622013625', 'name': 'Luigi Bormioli Michelangelo Beverage 20oz Set of 4', 'salePrice': 29.99, 'timestamp': '2023-08-05 22:29:04 EDT-0400' }]

英文:

I filtered the array of results by using a dictionary storing the upc id, and append into a list of documents if upc id is not existing.

  1. import pymongo
  2. myclient = pymongo.MongoClient("mongodb://mongoadmin:ansible@localhost:27017/")
  3. mydb = myclient["mydatabase"]
  4. mycol = mydb["product"]
  5. mycol.drop()
  6. data=[
  7. {
  8. "_id": {
  9. "oid": "64cf05707844ef1a25ee57fa"
  10. },
  11. "upc": "032622013625",
  12. "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  13. "salePrice": 29.99,
  14. "timestamp": "2023-08-05 22:29:04 EDT-0400"
  15. },
  16. {
  17. "_id": {
  18. "oid": "64cf057c7844ef1a25ee57fd"
  19. },
  20. "upc": "048894970887",
  21. "name": "Basic Window Fan - Holmes",
  22. "salePrice": 54.99,
  23. "available": False,
  24. "timestamp": "2023-08-05 22:29:16 EDT-0400"
  25. },
  26. {
  27. "_id": {
  28. "oid": "64cf05707844ef1a25ee57fb"
  29. },
  30. "upc": "032622013625",
  31. "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  32. "salePrice": 29.97,
  33. "timestamp": "2023-08-04 13:25:09 EDT-0400"
  34. }
  35. ]
  36. for d in data:
  37. x = mycol.insert_one(d)
  38. resp=mycol.create_index(
  39. [
  40. ("upc", "text")
  41. ]
  42. )
  43. print(resp)
  44. search="032622013625"
  45. response = mycol.find( { "$text": { "$search": f"/{search}/"}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)
  46. list_upc_already_seen=[]
  47. list_documents=[]
  48. for doc in response:
  49. upc=doc.get("upc")
  50. if upc not in list_upc_already_seen:
  51. list_documents.append(doc)
  52. list_upc_already_seen.append(upc)
  53. print(list_documents)

[{'upc': '032622013625', 'name': 'Luigi Bormioli Michelangelo Beverage 20oz Set of 4', 'salePrice': 29.99, 'timestamp': '2023-08-05 22:29:04 EDT-0400'}]

huangapple
  • 本文由 发表于 2023年8月10日 21:40:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876292.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定