Chroma数据库嵌入 = 在使用get()时为none。

huangapple go评论71阅读模式
英文:

Chroma database embeddings = none when using get()

问题

我是Chroma数据库的全新用户(以及相关的Python库)。

当我在collection上调用get时,嵌入始终为none,即使在向集合添加文档时明确设置/定义了嵌入(所以这不可能是生成嵌入的问题 - 我认为不是)。

对于以下代码(Python 3.10,chromadb 0.3.26),我期望在返回的字典中看到嵌入的列表,但实际上是none

import chromadb

chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="my_collection")
collection.add(
    embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
    documents=["This is a document", "This is another document"],
    metadatas[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)

print(collection.get())

输出:

{'ids': ['id1', 'id2'], 'embeddings': None, 'documents': ['This is a document', 'This is another document'], 'metadatas': [{'source': 'my_source'}, {'source': 'my_source'}]}

当使用get而不是query时,不会出现相同的问题:

print(collection.query(query_embeddings=[[1.2, 2.3, 4.4]], include=["embeddings"]))

输出:

{'ids': [['id1', 'id2']], 'embeddings': [[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]], 'documents': None, 'metadatas': None, 'distances': None}

当使用langchain包装器时,也会出现相同的问题。

有任何想法吗,朋友们? Chroma数据库嵌入 = 在使用get()时为none。

英文:

I am a brand new user of Chroma database (and the associate python libraries).

When I call get on a collection, embeddings is always none, even if embeddings are explicitly set/defined when adding documents to a collection (so it can't be an issue with generating the embeddings - I don't think).

For the following code (Python 3.10, chromadb 0.3.26), I expected to see a list of embeddings in the returned dictionary, but it is none.

import chromadb

chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="my_collection")
collection.add(
    embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)

print(collection.get())

Output:

{'ids': ['id1', 'id2'], 'embeddings': None, 'documents': ['This is a document', 'This is another document'], 'metadatas': [{'source': 'my_source'}, {'source': 'my_source'}]}

The same issue does not occur when using query instead of get:

print(collection.query(query_embeddings=[[1.2, 2.3, 4.4]], include=["embeddings"]))

Output:

{'ids': [['id1', 'id2']], 'embeddings': [[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]]], 'documents': None, 'metadatas': None, 'distances': None}

The same issue occurs when using langchain wrappers.

Any ideas, friends? Chroma数据库嵌入 = 在使用get()时为none。

答案1

得分: 4

根据文档 https://docs.trychroma.com/usage-guide,默认情况下会排除嵌入以提升性能:

在使用 get 或 query 时,您可以使用 include 参数指定要返回的数据类型 - 可以是 embeddings、documents、metadatas,以及对于 query,还有 distances。默认情况下,Chroma 会返回结果的文档、元数据,以及在查询时距离。嵌入默认情况下会因性能而被排除,而ID始终会被返回。

您可以在使用 get 时包含嵌入,如下所示:

print(collection.get(include=['embeddings', 'documents', 'metadatas']))
英文:

According to the documentation https://docs.trychroma.com/usage-guide embeddings are excluded by default for performance:

> When using get or query you can use the include parameter to specify which data you want returned - any of embeddings, documents, metadatas, and for query, distances. By default, Chroma will return the documents, metadatas and in the case of query, the distances of the results. embeddings are excluded by default for performance and the ids are always returned.

You can include the embeddings when using get as followed:

print(collection.get(include=['embeddings', 'documents', 'metadatas']))

huangapple
  • 本文由 发表于 2023年6月15日 21:27:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76482987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定