如何将文档列表添加到现有的llama-index索引中?

huangapple go评论53阅读模式
英文:

How can we add a list of documents to an existing index in llama-index?

问题

我有一个现有的索引,是使用 GPTVectorStoreIndex 创建的。然而,当我尝试使用 insert 方法将新文档添加到现有索引时,我遇到了以下错误:

AttributeError: 'list' object has no attribute 'get_text'

我更新索引的代码如下:

max_input_size = 4096
num_outputs = 5000
max_chunk_overlap = 256
chunk_size_limit = 3900
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
    
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

directory_path = "./trial_docs"
file_metadata = lambda x: {"filename": x}
reader = SimpleDirectoryReader(directory_path, file_metadata=file_metadata)
    
documents = reader.load_data()
print(type(documents))
index.insert(document=documents, service_context=service_context)
英文:

I have an existing index that is created using GPTVectorStoreIndex. However, when I am trying to add a new document to the existing index using the insert method, I am getting the following error :

AttributeError: 'list' object has no attribute 'get_text'

my code for updating the index is as follows :

max_input_size = 4096
num_outputs = 5000
max_chunk_overlap = 256
chunk_size_limit = 3900
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
    
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

directory_path = "./trial_docs"
file_metadata = lambda x : {"filename": x}
reader = SimpleDirectoryReader(directory_path, file_metadata=file_metadata)
    
documents = reader.load_data()
print(type(documents))
index.insert(document = documents, service_context = service_context)

答案1

得分: 2

我明白了,我之前做错的地方是将文档作为整体传递,这是一个 List 对象。正确的更新方式如下:

max_input_size = 4096
num_outputs = 5000
max_chunk_overlap = 256
chunk_size_limit = 3900
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
    
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

directory_path = "./trial_docs"
file_metadata = lambda x : {"filename": x}
reader = SimpleDirectoryReader(directory_path, file_metadata=file_metadata)
    
documents = reader.load_data()
print(type(documents))
for d in documents:
    index.insert(document = d, service_context = service_context)
英文:

I got it right, the mistake I was doing it was passing documents as a whole, which is a List object. The right way to update is as follows

max_input_size = 4096
num_outputs = 5000
max_chunk_overlap = 256
chunk_size_limit = 3900
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
    
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

directory_path = "./trial_docs"
file_metadata = lambda x : {"filename": x}
reader = SimpleDirectoryReader(directory_path, file_metadata=file_metadata)
    
documents = reader.load_data()
print(type(documents))
for d in documents:
    index.insert(document = d, service_context = service_context)

huangapple
  • 本文由 发表于 2023年5月22日 16:36:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76304374.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定