英文:
How can we add a list of documents to an existing index in llama-index?
问题
我有一个现有的索引,是使用 GPTVectorStoreIndex
创建的。然而,当我尝试使用 insert
方法将新文档添加到现有索引时,我遇到了以下错误:
AttributeError: 'list' object has no attribute 'get_text'
我更新索引的代码如下:
max_input_size = 4096
num_outputs = 5000
max_chunk_overlap = 256
chunk_size_limit = 3900
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
directory_path = "./trial_docs"
file_metadata = lambda x: {"filename": x}
reader = SimpleDirectoryReader(directory_path, file_metadata=file_metadata)
documents = reader.load_data()
print(type(documents))
index.insert(document=documents, service_context=service_context)
英文:
I have an existing index that is created using GPTVectorStoreIndex
. However, when I am trying to add a new document to the existing index using the insert
method, I am getting the following error :
AttributeError: 'list' object has no attribute 'get_text'
my code for updating the index is as follows :
max_input_size = 4096
num_outputs = 5000
max_chunk_overlap = 256
chunk_size_limit = 3900
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
directory_path = "./trial_docs"
file_metadata = lambda x : {"filename": x}
reader = SimpleDirectoryReader(directory_path, file_metadata=file_metadata)
documents = reader.load_data()
print(type(documents))
index.insert(document = documents, service_context = service_context)
答案1
得分: 2
我明白了,我之前做错的地方是将文档作为整体传递,这是一个 List
对象。正确的更新方式如下:
max_input_size = 4096
num_outputs = 5000
max_chunk_overlap = 256
chunk_size_limit = 3900
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
directory_path = "./trial_docs"
file_metadata = lambda x : {"filename": x}
reader = SimpleDirectoryReader(directory_path, file_metadata=file_metadata)
documents = reader.load_data()
print(type(documents))
for d in documents:
index.insert(document = d, service_context = service_context)
英文:
I got it right, the mistake I was doing it was passing documents as a whole, which is a List
object. The right way to update is as follows
max_input_size = 4096
num_outputs = 5000
max_chunk_overlap = 256
chunk_size_limit = 3900
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
directory_path = "./trial_docs"
file_metadata = lambda x : {"filename": x}
reader = SimpleDirectoryReader(directory_path, file_metadata=file_metadata)
documents = reader.load_data()
print(type(documents))
for d in documents:
index.insert(document = d, service_context = service_context)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论