ES dense_vector字段:必须指定’dims’

huangapple go评论95阅读模式
英文:

ES dense_vector field: 'dims' must be specified

问题

我有一个ElasticSearch(v7.5.1)索引,其中有一个名为“lda”的dense_vector字段,具有150个维度。如下所示,映射位于http://localhost:9200/documents/_mapping:

"documents": {
  "mappings": {
    [...]
    "lda": {
      "type":"dense_vector",
      "dims":150
    }
  }
}

当我尝试通过Elasticsearch Python客户端(v7.1.0)索引文档时,ES会抛出此错误消息:

{
  "type": "server",
  "timestamp": "2020-01-03T08:40:04,962Z",
  "level": "DEBUG",
  "component": "o.e.a.b.TransportShardBulkAction",
  "cluster.name": "docker-cluster",
  "node.name": "8d468383f2cf",
  "message": "[documents][0] failed to execute bulk item (create) index {[documents][document][S_uPam8BUsDzizMKxpRR], source[{"id":42129,...,"lda":[0.031139032915234566,0.02878846414387226,...]}]}",
  "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ",
  "node.id": "M_fMZ3KxQnWP3AiguV1_jA",
  "stacktrace": [
    "org.elasticsearch.index.mapper.MapperParsingException: The [dims] property must be specified for field [lda].",
    [...]
  ]
}

这是如何以编程方式将文档添加到索引的方式:

es = Elasticsearch(hosts="localhost:9200")
es.index(index=self.index, doc_type=doc_type, body=document_data)

其中document_data是一个字典,包含如上错误日志中所示的数据,包括以下内容:

{
  [...]
  "lda": [0.031139032915234566, ...]
}

该索引是在之前立即创建的,因此还没有任何文档。我注意到,当我创建索引时,会有以下输出:

{
  "type": "server",
  "timestamp": "2020-01-03T08:40:03,280Z",
  "level": "INFO",
  "component": "o.e.c.m.MetaDataCreateIndexService",
  "cluster.name": "docker-cluster",
  "node.name": "8d468383f2cf",
  "message": "[documents] creating index, cause [api], templates [], shards [1]/[1], mappings [_doc]",
  "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ",
  "node.id": "M_fMZ3KxQnWP3AiguV1_jA"
}

索引是如何创建的:

es = Elasticsearch(hosts="localhost:9200", serializer=BSONEncoder())
es.indices.create(index="documents", body=mapping)

其中mapping包含一个字典,定义了如上输出中所示的映射:

mapping = {
  "mappings": {
    "properties": {
      [...],
      "lda": {
          "type": "dense_vector",
          "dims": 150
      },
    }
  }
}

更新
我怀疑mapping可能是问题所在。尝试索引不包含lda字段的文档也会失败:

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have mo

因此,我编辑了映射以包括索引名称:

  "mappings": {
    "document": {    
      [...],
      "lda": {
        "type":"dense_vector",
        "dims":150
      }
    }
  }
} 

但这会导致映射为空,类型在索引文档时被推断。

--- 更新结束 ---

我不确定从哪里开始调试。在创建索引时出现的弃用警告似乎可能与问题相关,但我不确定如何解决它。此外,错误消息似乎并没有真正指出这是问题所在。

关于dense_vector类型的文档没有提供太多细节。但在文档中显示的示例确实有效(使用cURL请求)。

在Python中通过cURL方法创建索引是否存在功能差异?

我该如何找出真正的错误消息;维度明显通过dims属性定义了。

英文:

I have an ElasticSearch (v7.5.1) index with a dense_vector field called lda, with 150 dimensions. The mapping, as shown on http://localhost:9200/documents/_mapping, looks like this:

"documents": {
  "mappings": {
    [...]
    "lda": {
      "type":"dense_vector",
      "dims":150
    }
  }
}

When I try to index a document through the Elasticsearch Client for Python (v7.1.0), ES throws this error message:

{"type": "server", "timestamp": "2020-01-03T08:40:04,962Z", "level": "DEBUG", "component": "o.e.a.b.TransportShardBulkAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents][0] failed to execute bulk item
 (create) index {[documents][document][S_uPam8BUsDzizMKxpRR], source[{\"id\":42129,[...],\
"lda\":[0.031139032915234566,0.02878846414387226,0.026767859235405922,0.025012295693159103,0.02347283624112606,0.022111890837550163,0.02090011164546013,0.019814245402812958,0.0188356414437294,0.01794915273785591,0.01714235544204712,0.01640496961772442,0.015728404745459557,0.
015105433762073517,0.014529934152960777,0.013996675610542297,0.013501172885298729,0.013039554469287395,0.012608458288013935,0.012204954400658607,0.011826476082205772,0.011470765806734562,0.011135827749967575,0.010819895192980766,0.01052139326930046,0.010238921269774437,0.0,0
.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]}]}", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id":
"M_fMZ3KxQnWP3AiguV1_jA" , 
"stacktrace": ["org.elasticsearch.index.mapper.MapperParsingException: The [dims] property must be specified for field [lda].",                                                                                                            [22/1876]
"at org.elasticsearch.xpack.vectors.mapper.DenseVectorFieldMapper$TypeParser.parse(DenseVectorFieldMapper.java:104) ~[?:?]",                                                                                                                        
"at org.elasticsearch.index.mapper.DocumentParser.createBuilderFromFieldType(DocumentParser.java:680) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.parseDynamicValue(DocumentParser.java:826) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                     
"at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:619) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.parseNonDynamicArray(DocumentParser.java:601) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                  
"at org.elasticsearch.index.mapper.DocumentParser.parseArray(DocumentParser.java:560) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:420) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                      
"at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:395) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                   
"at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:112) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                 
"at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:71) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                          
"at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                                 
"at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:791) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:768) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:740) ~[elasticsearch-7.5.1.jar:7.5.1]",
[...]

This is how documents are added to the index programmatically:

es = Elasticsearch(hosts="localhost:9200")
es.index(index=self.index, doc_type=doc_type, body=document_data)

Where document_data is a dictionary, holding the data as shown in the error log above, including this:

{
  [...]
  "lda": [0.031139032915234566, ...]
}

The index was created immediately before, so no documents in there yet.
I notice, when I created the index, there was this output:

{"type": "server", "timestamp": "2020-01-03T08:40:03,280Z", "level": "INFO", "component": "o.e.c.m.MetaDataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents] creating index, cause [api], 
templates [], shards [1]/[1], mappings [_doc]", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA"  }                                                                                                                                                   
{"type": "deprecation", "timestamp": "2020-01-03T08:40:04,940Z", "level": "WARN", "component": "o.e.d.r.a.d.RestDeleteAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA"  }

This is how the index has been created:

    es = Elasticsearch(hosts="localhost:9200", serializer=BSONEncoder())
    es.indices.create(index="documents", body=mapping)

Where mapping contains a dictionary defining the mappings as show in the output above:

mappings = {
  "mappings": {
    "properties": {
      [...],
      "lda": {
          "type": "dense_vector",
          "dims": 150
      },
    }
  }
}

Update:
I suspect that the mappings are indeed the problem. Indexing a document without the lda field also fails:

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have mo

So, I edited the mappings to include the index name:

  "mappings": {
    "document": {    
      [...]
      "lda": {
        "type":"dense_vector",
        "dims":150
      }
    }
  }
} 

This results in an empty mapping though, with the types being inferred while index documents.

--- End update ---

I am not sure where to proceed debugging. The deprecation warning when creating the index seems potentially relevant, but I'm not sure how to resolve it. Furthermore, the error message does not really seem to indicate that that was the problem.

The documentation for the dense_vector type does not reveal many details. The examples shown there do work, however (using cURL requests).

Is there a functional difference between how an index is created through Python from the cURL approach?

How can I find out what the real error message is; the dimensionality is clearly defined through the dims property.

答案1

得分: 0

你正在使用不再支持doc_type的ES 7.x版本 - 详情请查看此处,这也在索引创建返回的消息中有说明:

[类型移除] 在文档索引请求中指定类型已经不再建议,应该使用无类型的端点

但是,你在映射中尝试设置了doc_type

es.index(index=self.index, doc_type=doc_type, body=document_data)

从版本7开始,你只能设置_doc作为doc_type,但你试图设置自定义的document,这会产生错误,并且Elastic将拒绝你的映射:

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have more than one doc_type _doc, document)

为了解决问题,你只需尝试在映射中移除doc_type - 你的doc_type变量或在documents索引创建期间的mapping变量。

英文:

You are using ES 7.x that doesn't support anymore doc_type -doc here - it is written also in the message returned from index creation:

[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints

But you tried to set a doc_type in your mapping:

es.index(index=self.index, doc_type=doc_type, body=document_data)

From the version 7 you could set only _doc as doc_type, but you tried to set your own - document . This produces an error, and your mapping is rejected by elastic:

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have more ...... (my add than one doc_type _doc, document)

to resolve your problem you should simply try to remove the doc_type in the mapping -your doc_type var or mapping var during documents index creation

huangapple
  • 本文由 发表于 2020年1月3日 17:02:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/59575672.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定