Elastic Search重新索引没有进展

huangapple go评论59阅读模式
英文:

Elastic Search Reindexing does not progress

问题

你可以看到,在2461个文档中,没有一个被创建。

我使用requests_per_second=1

而且Ingest pipeline是:

{
  "document-pipeline": {
    "description": "abc",
    "on_failure": [
      {
        "set": {
          "description": "Index document to 'failed-<index>'",
          "field": "_index",
          "value": "failed-{{{_index}}}"
        }
      },
      {
        "set": {
          "description": "Sazzad pycharm ingestion failed error message",
          "field": "ingest.failure",
          "value": "{{_ingest.on_failure_message}}"
        }
      }
    ],
    "processors": [
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "name_embedding",
          "field_map": {
            "name": "text_field"
          }
        }
      },
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "description_embedding",
          "field_map": {
            "description": "text_field"
          }
        }
      },
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "custom_field1_embedding",
          "field_map": {
            "custom_field1": "text_field"
          }
        }
      }
    ]
  }
}

有什么线索吗?问题出在哪里?

编辑2 - GET _cat/tasks/?v&amp;actions=*reindex&amp;detailed 的输出:

action                     task_id                     parent_task_id type      start_time    timestamp running_time ip         node         description
indices:data/write/reindex zn7bFe98Qly-eEd_VrXAMQ:1250 -              transport 1681738821297 13:40:21  3.2m         172.19.0.2 4cdda1de1ed0 reindex from [source] to [destination]

编辑3 - GET _nodes/hot_threads 的输出:

::: {4cdda1de1ed0}{zn7bFe98Qly-eEd_VrXAMQ}{nlHIUMF4SnGX8h_vrEYVYA}{4cdda1de1ed0}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.7.0}{ml.max_jvm_size=4118806528, ml.allocated_processors=4, ml.machine_memory=8233017344, xpack.installed=true, ml.allocated_processors_double=4.0}
   Hot threads at 2023-04-17T14:08:51.922Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.6% [cpu=0.6%, idle=99.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[4cdda1de1ed0][transport_worker][T#4]'
     3/10 snapshots sharing following 3 elements
       io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
       io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
       java.base@19.0.2/java.lang.Thread.run(Thread.java:1589)

编辑4 - GET /_nodes/stats?pretty&amp;filter_path=**.ingest.pipelines 的输出:

{
  "nodes": {
    "zn7bFe98Qly-eEd_VrXAMQ": {
      "ingest": {
        "pipelines": {
          "recipe-name_description_custom_steps_custom_ingredients_custom_tags-field-embeddings-pipeline": {
            "count": 0,
            "time_in_millis": 0,
            "current": 0,
            "failed": 0,
            "processors": [
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              }
            ]
          },
          // 其他管道...
        }
      }
    }
  }
}
英文:

I use ES reindex to copy documents from the src index to the destination index. While doing that, I am running a few ingestion pipelines. The reindexing is so slow. Not even a single document was uploaded in 1 hour. I am running ES on my MacBook Pro 2022.

Here is the reindexing task details.

{
  &quot;completed&quot;: false,
  &quot;task&quot;: {
    &quot;node&quot;: &quot;zn7bFe98Qly-eEd_VrXAMQ&quot;,
    &quot;id&quot;: 688,
    &quot;type&quot;: &quot;transport&quot;,
    &quot;action&quot;: &quot;indices:data/write/reindex&quot;,
    &quot;status&quot;: {
      &quot;total&quot;: 2461,
      &quot;updated&quot;: 0,
      &quot;created&quot;: 0,
      &quot;deleted&quot;: 0,
      &quot;batches&quot;: 1,
      &quot;version_conflicts&quot;: 0,
      &quot;noops&quot;: 0,
      &quot;retries&quot;: {
        &quot;bulk&quot;: 0,
        &quot;search&quot;: 0
      },
      &quot;throttled_millis&quot;: 0,
      &quot;requests_per_second&quot;: 1,
      &quot;throttled_until_millis&quot;: 0
    },
    &quot;description&quot;: &quot;reindex from [source_index] to [destination_index]&quot;,
    &quot;start_time_in_millis&quot;: 1681734787348,
    &quot;running_time_in_nanos&quot;: 1800085540069,
    &quot;cancellable&quot;: true,
    &quot;cancelled&quot;: false,
    &quot;headers&quot;: {}
  }
}

You can see out of 2461; nothing was created.

I am using requests_per_second=1

And Ingest pipeline is

{
  &quot;document-pipeline&quot;: {
    &quot;description&quot;: &quot;abc&quot;,
    &quot;on_failure&quot;: [
      {
        &quot;set&quot;: {
          &quot;description&quot;: &quot;Index document to &#39;failed-&lt;index&gt;&#39;&quot;,
          &quot;field&quot;: &quot;_index&quot;,
          &quot;value&quot;: &quot;failed-{{{_index}}}&quot;
        }
      },
      {
        &quot;set&quot;: {
          &quot;description&quot;: &quot;Sazzad pycharm ingestion failed error message&quot;,
          &quot;field&quot;: &quot;ingest.failure&quot;,
          &quot;value&quot;: &quot;{{_ingest.on_failure_message}}&quot;
        }
      }
    ],
    &quot;processors&quot;: [
      {
        &quot;inference&quot;: {
          &quot;model_id&quot;: &quot;sentence-transformers__paraphrase-mpnet-base-v2&quot;,
          &quot;target_field&quot;: &quot;name_embedding&quot;,
          &quot;field_map&quot;: {
            &quot;name&quot;: &quot;text_field&quot;
          }
        }
      },
      {
        &quot;inference&quot;: {
          &quot;model_id&quot;: &quot;sentence-transformers__paraphrase-mpnet-base-v2&quot;,
          &quot;target_field&quot;: &quot;description_embedding&quot;,
          &quot;field_map&quot;: {
            &quot;description&quot;: &quot;text_field&quot;
          }
        }
      },
      {
        &quot;inference&quot;: {
          &quot;model_id&quot;: &quot;sentence-transformers__paraphrase-mpnet-base-v2&quot;,
          &quot;target_field&quot;: &quot;custom_field1_embedding&quot;,
          &quot;field_map&quot;: {
            &quot;custom_field1&quot;: &quot;text_field&quot;
          }
        }
      }
        }
      }
    ]
  }
}

Any clue? What is the problem under the hood?

Edit 2 - Output of GET _cat/tasks/?v&amp;actions=*reindex&amp;detailed

`action                     task_id                     parent_task_id type      start_time    timestamp running_time ip         node         description
indices:data/write/reindex zn7bFe98Qly-eEd_VrXAMQ:1250 -              transport 1681738821297 13:40:21  3.2m         172.19.0.2 4cdda1de1ed0 reindex from [source] to [destination]

Edit 3 - Output of GET _nodes/hot_threads

::: {4cdda1de1ed0}{zn7bFe98Qly-eEd_VrXAMQ}{nlHIUMF4SnGX8h_vrEYVYA}{4cdda1de1ed0}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.7.0}{ml.max_jvm_size=4118806528, ml.allocated_processors=4, ml.machine_memory=8233017344, xpack.installed=true, ml.allocated_processors_double=4.0}
   Hot threads at 2023-04-17T14:08:51.922Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.6% [cpu=0.6%, idle=99.4%] (500ms out of 500ms) cpu usage by thread &#39;elasticsearch[4cdda1de1ed0][transport_worker][T#4]&#39;
     3/10 snapshots sharing following 3 elements
       io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
       io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
       java.base@19.0.2/java.lang.Thread.run(Thread.java:1589)

Edit 4: output of GET /_nodes/stats?pretty&amp;filter_path=**.ingest.pipelines

{
&quot;nodes&quot;: {
&quot;zn7bFe98Qly-eEd_VrXAMQ&quot;: {
&quot;ingest&quot;: {
&quot;pipelines&quot;: {
&quot;recipe-name_description_custom_steps_custom_ingredients_custom_tags-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-custom_steps-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-description-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe--field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 1000,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 1000,
&quot;time_in_millis&quot;: 288844826,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 471,
&quot;time_in_millis&quot;: 645430266,
&quot;current&quot;: 529,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 471,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-name-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-custom_ingredients-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-browser.screenshot-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-browser.screenshot@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;.fleet_final_pipeline-1&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;date:truncate-subseconds-event-ingested&quot;: {
&quot;type&quot;: &quot;date&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;remove&quot;: {
&quot;type&quot;: &quot;remove&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;set_security_user&quot;: {
&quot;type&quot;: &quot;set_security_user&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;script:agent-id-status&quot;: {
&quot;type&quot;: &quot;script&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;remove&quot;: {
&quot;type&quot;: &quot;remove&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-icmp-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-icmp@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-http-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-http@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-name_description_custom_steps_custom_ingredients-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-browser.network-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-browser.network@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-browser-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-browser@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-tcp-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-tcp@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
}
}
}
}
}
}

答案1

得分: 1

如果你查看 recipe--field-embeddings-pipeline,你会发现管道中有1000个文档:

  • 所有1000个文档都经历了第一个推断处理器
  • 529个文档当前正在通过第二个推断处理器处理
  • 471个文档当前正在通过第三个推断处理器处理

这意味着它正在工作,但需要时间,可能是因为 requests_per_second=1 使其变得非常慢。

所以没有问题,只是慢。你可以通过 GET /_nodes/stats?pretty&amp;filter_path=**.ingest.pipelines 进行调整并查看进度,看看是否可以加快速度。

英文:

If you look at recipe--field-embeddings-pipeline you can see that there are 1000 documents in the pipeline:

  • all 1000 have gone through the first inference processor
  • 529 are currently going through the second inference processor
  • 471 are currently going through the third inference processor

This means that it is working, but it takes time, probably because requests_per_second=1 makes it really slow.

So nothing wrong, just slow. You can tune and follow the progress via GET /_nodes/stats?pretty&amp;filter_path=**.ingest.pipelines to see if you can get it to go faster.

huangapple
  • 本文由 发表于 2023年4月17日 21:06:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035487.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定