2023年6月1日 15:07:40go评论252阅读模式

英文:

How to see the Embedding of the documents with Chroma (or any other DB) saved in Lang Chain?

问题

I can see everything but the Embedding of the documents when I used Chroma with Langchain and OpenAI embeddings. It always shows me None for that.

Here is the code:

for db_collection_name in tqdm(["class1-sub2-chap3", "class2-sub3-chap4"]):
    documents = []
    doc_ids = []

    for doc_index in range(3):
        cl, sub, chap = db_collection_name.split("-")
        content = f"This is {db_collection_name}-doc{doc_index}"
        doc = Document(page_content=content, metadata={"chunk_num": doc_index, "chapter":chap, "class":cl, "subject":sub})
        documents.append(doc)
        doc_ids.append(str(doc_index))

    # Initialize a Chroma instance with the original document
    db = Chroma.from_documents(
         collection_name=db_collection_name,
         documents=documents, ids=doc_ids,
         embedding=embeddings, 
         persist_directory="./data")
    
     db.persist()

When I do db.get(), I see everything as expected except embedding is None.

{'ids': ['0', '1', '2'],
 'embeddings': None,
 'documents': ['This is class1-sub2-chap3-doc0',
  'This is class1-sub2-chap3-doc1',
  'This is class1-sub2-chap3-doc2'],
 'metadatas': [{'chunk_num': 0,
   'chapter': 'chap3',
   'class': 'class1',
   'subject': 'sub2'},
  {'chunk_num': 1, 'chapter': 'chap3', 'class': 'class1', 'subject': 'sub2'},
  {'chunk_num': 2, 'chapter': 'chap3', 'class': 'class1', 'subject': 'sub2'}]}

My embeddings are also working fine as it returns:

len(embeddings.embed_documents(["EMBED THIS"])[0])
>> 1536

Also, in my ./data directory, I have an Embedding file as chroma-embeddings.parquet.

I tried the example with example given in the document but it also shows None too.

# Import Document class
from langchain.docstore.document import Document

# Initial document content and id
initial_content = "This is an initial document content"
document_id = "doc1"

# Create an instance of Document with initial content and metadata
original_doc = Document(page_content=initial_content, metadata={"page": "0"})

# Initialize a Chroma instance with the original document
new_db = Chroma.from_documents(
    collection_name="test_collection",
    documents=[original_doc],
    embedding=OpenAIEmbeddings(),  # using the same embeddings as before
    ids=[document_id],
)

Here also new_db.get() gives me None.

英文:

I can see everything but the Embedding of the documents when I used Chroma with Langchain and OpenAI embeddings. It always show me None for that

Here is the code:

for db_collection_name in tqdm([&quot;class1-sub2-chap3&quot;, &quot;class2-sub3-chap4&quot;]):
    documents = []
    doc_ids = []

    for doc_index in range(3):
        cl, sub, chap = db_collection_name.split(&quot;-&quot;)
        content = f&quot;This is {db_collection_name}-doc{doc_index}&quot;
        doc = Document(page_content=content, metadata={&quot;chunk_num&quot;: doc_index, &quot;chapter&quot;:chap, &quot;class&quot;:cl, &quot;subject&quot;:sub})
        documents.append(doc)
        doc_ids.append(str(doc_index))


    # # Initialize a Chroma instance with the original document
    db = Chroma.from_documents(
         collection_name=db_collection_name,
         documents=documents, ids=doc_ids,
         embedding=embeddings, 
         persist_directory=&quot;./data&quot;)
    
     db.persist()

when I do db.get(), I see everything as expected except embedding is None.

{&#39;ids&#39;: [&#39;0&#39;, &#39;1&#39;, &#39;2&#39;],
 &#39;embeddings&#39;: None,
 &#39;documents&#39;: [&#39;This is class1-sub2-chap3-doc0&#39;,
  &#39;This is class1-sub2-chap3-doc1&#39;,
  &#39;This is class1-sub2-chap3-doc2&#39;],
 &#39;metadatas&#39;: [{&#39;chunk_num&#39;: 0,
   &#39;chapter&#39;: &#39;chap3&#39;,
   &#39;class&#39;: &#39;class1&#39;,
   &#39;subject&#39;: &#39;sub2&#39;},
  {&#39;chunk_num&#39;: 1, &#39;chapter&#39;: &#39;chap3&#39;, &#39;class&#39;: &#39;class1&#39;, &#39;subject&#39;: &#39;sub2&#39;},
  {&#39;chunk_num&#39;: 2, &#39;chapter&#39;: &#39;chap3&#39;, &#39;class&#39;: &#39;class1&#39;, &#39;subject&#39;: &#39;sub2&#39;}]}

My embeddings is also working fine as it returns:

len(embeddings.embed_documents([&quot;EMBED THIS&quot;])[0])
&gt;&gt; 1536

also, in my ./data directory I have Embedding file as chroma-embeddings.parquet

I tried the example with example given in document but it shows None too

# Import Document class
from langchain.docstore.document import Document

# Initial document content and id
initial_content = &quot;This is an initial document content&quot;
document_id = &quot;doc1&quot;

# Create an instance of Document with initial content and metadata
original_doc = Document(page_content=initial_content, metadata={&quot;page&quot;: &quot;0&quot;})

# Initialize a Chroma instance with the original document
new_db = Chroma.from_documents(
    collection_name=&quot;test_collection&quot;,
    documents=[original_doc],
    embedding=OpenAIEmbeddings(),  # using the same embeddings as before
    ids=[document_id],
)

Here also new_db.get() gives me None

答案1

得分: 7

# 获取所有嵌入
db._collection.get(include=['embeddings'])

# 根据文档ID获取嵌入
db._collection.get(ids=['doc0', ..., 'docN'], include=['embeddings'])

英文:

You just need to specify that you want the embeddings as well when using .get

# Get all embeddings
db._collection.get(include=[&#39;embeddings&#39;])

# Get embeddings by document_id
db._collection.get(ids=[&#39;doc0&#39;, ..., &#39;docN&#39;], include=[&#39;embeddings&#39;])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何查看保存在Lang Chain中的文档的嵌入（或任何其他DB）？

问题

答案1

如何从文本文件中提取一个值并将其与另一个值放在一起

Trouble with writing to a csv using utf-8 encoding 写入CSV文件时使用UTF-8编码遇到问题

标准化两个数据框的步骤

在Polars数据框中，通过另一列类型为列表来筛选一个类型为列表的列。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论