英文:
Chroma database embeddings = none when using get()
问题
我是Chroma数据库的全新用户(以及相关的Python库)。
当我在collection
上调用get
时,嵌入始终为none
,即使在向集合添加文档时明确设置/定义了嵌入(所以这不可能是生成嵌入的问题 - 我认为不是)。
对于以下代码(Python 3.10,chromadb 0.3.26),我期望在返回的字典中看到嵌入的列表,但实际上是none
。
import chromadb
chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="my_collection")
collection.add(
embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
documents=["This is a document", "This is another document"],
metadatas[{"source": "my_source"}, {"source": "my_source"}],
ids=["id1", "id2"]
)
print(collection.get())
输出:
{'ids': ['id1', 'id2'], 'embeddings': None, 'documents': ['This is a document', 'This is another document'], 'metadatas': [{'source': 'my_source'}, {'source': 'my_source'}]}
当使用get
而不是query
时,不会出现相同的问题:
print(collection.query(query_embeddings=[[1.2, 2.3, 4.4]], include=["embeddings"]))
输出:
{'ids': [['id1', 'id2']], 'embeddings': [[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]], 'documents': None, 'metadatas': None, 'distances': None}
当使用langchain
包装器时,也会出现相同的问题。
有任何想法吗,朋友们?
英文:
I am a brand new user of Chroma database (and the associate python libraries).
When I call get
on a collection
, embeddings is always none
, even if embeddings are explicitly set/defined when adding documents to a collection (so it can't be an issue with generating the embeddings - I don't think).
For the following code (Python 3.10, chromadb 0.3.26), I expected to see a list of embeddings in the returned dictionary, but it is none
.
import chromadb
chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="my_collection")
collection.add(
embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
documents=["This is a document", "This is another document"],
metadatas=[{"source": "my_source"}, {"source": "my_source"}],
ids=["id1", "id2"]
)
print(collection.get())
Output:
{'ids': ['id1', 'id2'], 'embeddings': None, 'documents': ['This is a document', 'This is another document'], 'metadatas': [{'source': 'my_source'}, {'source': 'my_source'}]}
The same issue does not occur when using query
instead of get
:
print(collection.query(query_embeddings=[[1.2, 2.3, 4.4]], include=["embeddings"]))
Output:
{'ids': [['id1', 'id2']], 'embeddings': [[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]]], 'documents': None, 'metadatas': None, 'distances': None}
The same issue occurs when using langchain
wrappers.
Any ideas, friends?
答案1
得分: 4
根据文档 https://docs.trychroma.com/usage-guide,默认情况下会排除嵌入以提升性能:
在使用 get 或 query 时,您可以使用 include 参数指定要返回的数据类型 - 可以是 embeddings、documents、metadatas,以及对于 query,还有 distances。默认情况下,Chroma 会返回结果的文档、元数据,以及在查询时距离。嵌入默认情况下会因性能而被排除,而ID始终会被返回。
您可以在使用 get
时包含嵌入,如下所示:
print(collection.get(include=['embeddings', 'documents', 'metadatas']))
英文:
According to the documentation https://docs.trychroma.com/usage-guide embeddings are excluded by default for performance:
> When using get or query you can use the include parameter to specify which data you want returned - any of embeddings, documents, metadatas, and for query, distances. By default, Chroma will return the documents, metadatas and in the case of query, the distances of the results. embeddings are excluded by default for performance and the ids are always returned.
You can include the embeddings when using get
as followed:
print(collection.get(include=['embeddings', 'documents', 'metadatas']))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论