2023年4月17日 21:06:58go评论59阅读模式

英文:

Elastic Search Reindexing does not progress

问题

你可以看到，在2461个文档中，没有一个被创建。

我使用requests_per_second=1。

而且Ingest pipeline是：

{
  "document-pipeline": {
    "description": "abc",
    "on_failure": [
      {
        "set": {
          "description": "Index document to 'failed-<index>'",
          "field": "_index",
          "value": "failed-{{{_index}}}"
        }
      },
      {
        "set": {
          "description": "Sazzad pycharm ingestion failed error message",
          "field": "ingest.failure",
          "value": "{{_ingest.on_failure_message}}"
        }
      }
    ],
    "processors": [
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "name_embedding",
          "field_map": {
            "name": "text_field"
          }
        }
      },
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "description_embedding",
          "field_map": {
            "description": "text_field"
          }
        }
      },
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "custom_field1_embedding",
          "field_map": {
            "custom_field1": "text_field"
          }
        }
      }
    ]
  }
}

有什么线索吗？问题出在哪里？

编辑2 - GET _cat/tasks/?v&actions=*reindex&detailed 的输出：

action                     task_id                     parent_task_id type      start_time    timestamp running_time ip         node         description
indices:data/write/reindex zn7bFe98Qly-eEd_VrXAMQ:1250 -              transport 1681738821297 13:40:21  3.2m         172.19.0.2 4cdda1de1ed0 reindex from [source] to [destination]

编辑3 - GET _nodes/hot_threads 的输出：

::: {4cdda1de1ed0}{zn7bFe98Qly-eEd_VrXAMQ}{nlHIUMF4SnGX8h_vrEYVYA}{4cdda1de1ed0}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.7.0}{ml.max_jvm_size=4118806528, ml.allocated_processors=4, ml.machine_memory=8233017344, xpack.installed=true, ml.allocated_processors_double=4.0}
   Hot threads at 2023-04-17T14:08:51.922Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.6% [cpu=0.6%, idle=99.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[4cdda1de1ed0][transport_worker][T#4]'
     3/10 snapshots sharing following 3 elements
       io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
       io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
       java.base@19.0.2/java.lang.Thread.run(Thread.java:1589)

编辑4 - GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines 的输出：

{
  "nodes": {
    "zn7bFe98Qly-eEd_VrXAMQ": {
      "ingest": {
        "pipelines": {
          "recipe-name_description_custom_steps_custom_ingredients_custom_tags-field-embeddings-pipeline": {
            "count": 0,
            "time_in_millis": 0,
            "current": 0,
            "failed": 0,
            "processors": [
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              }
            ]
          },
          // 其他管道...
        }
      }
    }
  }
}

英文:

I use ES reindex to copy documents from the src index to the destination index. While doing that, I am running a few ingestion pipelines. The reindexing is so slow. Not even a single document was uploaded in 1 hour. I am running ES on my MacBook Pro 2022.

Here is the reindexing task details.

{
  &quot;completed&quot;: false,
  &quot;task&quot;: {
    &quot;node&quot;: &quot;zn7bFe98Qly-eEd_VrXAMQ&quot;,
    &quot;id&quot;: 688,
    &quot;type&quot;: &quot;transport&quot;,
    &quot;action&quot;: &quot;indices:data/write/reindex&quot;,
    &quot;status&quot;: {
      &quot;total&quot;: 2461,
      &quot;updated&quot;: 0,
      &quot;created&quot;: 0,
      &quot;deleted&quot;: 0,
      &quot;batches&quot;: 1,
      &quot;version_conflicts&quot;: 0,
      &quot;noops&quot;: 0,
      &quot;retries&quot;: {
        &quot;bulk&quot;: 0,
        &quot;search&quot;: 0
      },
      &quot;throttled_millis&quot;: 0,
      &quot;requests_per_second&quot;: 1,
      &quot;throttled_until_millis&quot;: 0
    },
    &quot;description&quot;: &quot;reindex from [source_index] to [destination_index]&quot;,
    &quot;start_time_in_millis&quot;: 1681734787348,
    &quot;running_time_in_nanos&quot;: 1800085540069,
    &quot;cancellable&quot;: true,
    &quot;cancelled&quot;: false,
    &quot;headers&quot;: {}
  }
}

You can see out of 2461; nothing was created.

I am using requests_per_second=1

And Ingest pipeline is

{
  &quot;document-pipeline&quot;: {
    &quot;description&quot;: &quot;abc&quot;,
    &quot;on_failure&quot;: [
      {
        &quot;set&quot;: {
          &quot;description&quot;: &quot;Index document to &#39;failed-&lt;index&gt;&#39;&quot;,
          &quot;field&quot;: &quot;_index&quot;,
          &quot;value&quot;: &quot;failed-{{{_index}}}&quot;
        }
      },
      {
        &quot;set&quot;: {
          &quot;description&quot;: &quot;Sazzad pycharm ingestion failed error message&quot;,
          &quot;field&quot;: &quot;ingest.failure&quot;,
          &quot;value&quot;: &quot;{{_ingest.on_failure_message}}&quot;
        }
      }
    ],
    &quot;processors&quot;: [
      {
        &quot;inference&quot;: {
          &quot;model_id&quot;: &quot;sentence-transformers__paraphrase-mpnet-base-v2&quot;,
          &quot;target_field&quot;: &quot;name_embedding&quot;,
          &quot;field_map&quot;: {
            &quot;name&quot;: &quot;text_field&quot;
          }
        }
      },
      {
        &quot;inference&quot;: {
          &quot;model_id&quot;: &quot;sentence-transformers__paraphrase-mpnet-base-v2&quot;,
          &quot;target_field&quot;: &quot;description_embedding&quot;,
          &quot;field_map&quot;: {
            &quot;description&quot;: &quot;text_field&quot;
          }
        }
      },
      {
        &quot;inference&quot;: {
          &quot;model_id&quot;: &quot;sentence-transformers__paraphrase-mpnet-base-v2&quot;,
          &quot;target_field&quot;: &quot;custom_field1_embedding&quot;,
          &quot;field_map&quot;: {
            &quot;custom_field1&quot;: &quot;text_field&quot;
          }
        }
      }
        }
      }
    ]
  }
}

Any clue? What is the problem under the hood?

Edit 2 - Output of GET _cat/tasks/?v&actions=*reindex&detailed

`action                     task_id                     parent_task_id type      start_time    timestamp running_time ip         node         description
indices:data/write/reindex zn7bFe98Qly-eEd_VrXAMQ:1250 -              transport 1681738821297 13:40:21  3.2m         172.19.0.2 4cdda1de1ed0 reindex from [source] to [destination]

Edit 3 - Output of GET _nodes/hot_threads

::: {4cdda1de1ed0}{zn7bFe98Qly-eEd_VrXAMQ}{nlHIUMF4SnGX8h_vrEYVYA}{4cdda1de1ed0}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.7.0}{ml.max_jvm_size=4118806528, ml.allocated_processors=4, ml.machine_memory=8233017344, xpack.installed=true, ml.allocated_processors_double=4.0}
   Hot threads at 2023-04-17T14:08:51.922Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.6% [cpu=0.6%, idle=99.4%] (500ms out of 500ms) cpu usage by thread &#39;elasticsearch[4cdda1de1ed0][transport_worker][T#4]&#39;
     3/10 snapshots sharing following 3 elements
       io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
       io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
       java.base@19.0.2/java.lang.Thread.run(Thread.java:1589)

Edit 4: output of GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines

{
&quot;nodes&quot;: {
&quot;zn7bFe98Qly-eEd_VrXAMQ&quot;: {
&quot;ingest&quot;: {
&quot;pipelines&quot;: {
&quot;recipe-name_description_custom_steps_custom_ingredients_custom_tags-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-custom_steps-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-description-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe--field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 1000,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 1000,
&quot;time_in_millis&quot;: 288844826,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 471,
&quot;time_in_millis&quot;: 645430266,
&quot;current&quot;: 529,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 471,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-name-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-custom_ingredients-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-browser.screenshot-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-browser.screenshot@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;.fleet_final_pipeline-1&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;date:truncate-subseconds-event-ingested&quot;: {
&quot;type&quot;: &quot;date&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;remove&quot;: {
&quot;type&quot;: &quot;remove&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;set_security_user&quot;: {
&quot;type&quot;: &quot;set_security_user&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;script:agent-id-status&quot;: {
&quot;type&quot;: &quot;script&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;remove&quot;: {
&quot;type&quot;: &quot;remove&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-icmp-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-icmp@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-http-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-http@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;recipe-name_description_custom_steps_custom_ingredients-field-embeddings-pipeline&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
},
{
&quot;inference&quot;: {
&quot;type&quot;: &quot;inference&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-browser.network-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-browser.network@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-browser-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-browser@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
},
&quot;synthetics-tcp-0.11.8&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0,
&quot;processors&quot;: [
{
&quot;pipeline:synthetics-tcp@custom&quot;: {
&quot;type&quot;: &quot;pipeline&quot;,
&quot;stats&quot;: {
&quot;count&quot;: 0,
&quot;time_in_millis&quot;: 0,
&quot;current&quot;: 0,
&quot;failed&quot;: 0
}
}
}
]
}
}
}
}
}
}

答案1

得分: 1

如果你查看 recipe--field-embeddings-pipeline，你会发现管道中有1000个文档：

所有1000个文档都经历了第一个推断处理器
529个文档当前正在通过第二个推断处理器处理
471个文档当前正在通过第三个推断处理器处理

这意味着它正在工作，但需要时间，可能是因为 requests_per_second=1 使其变得非常慢。

所以没有问题，只是慢。你可以通过 GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines 进行调整并查看进度，看看是否可以加快速度。

英文:

If you look at recipe--field-embeddings-pipeline you can see that there are 1000 documents in the pipeline:

all 1000 have gone through the first inference processor
529 are currently going through the second inference processor
471 are currently going through the third inference processor

This means that it is working, but it takes time, probably because requests_per_second=1 makes it really slow.

So nothing wrong, just slow. You can tune and follow the progress via GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines to see if you can get it to go faster.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Elastic Search重新索引没有进展

问题

答案1

Ruby异常发生：无法将LogStash::Timestamp隐式转换为字符串。

如何在通过yml清单部署StatefulSet时编辑elasticsearch.yml？

使用Elasticsearch数据源按日期时间字段的存在对Grafana图表进行分组。

为什么在Elasticsearch慢查询中，took_millis和timeout之间的时间成本很大？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论