英文:
Elastic Search Reindexing does not progress
问题
你可以看到,在2461个文档中,没有一个被创建。
我使用requests_per_second=1。
而且Ingest pipeline是:
{
  "document-pipeline": {
    "description": "abc",
    "on_failure": [
      {
        "set": {
          "description": "Index document to 'failed-<index>'",
          "field": "_index",
          "value": "failed-{{{_index}}}"
        }
      },
      {
        "set": {
          "description": "Sazzad pycharm ingestion failed error message",
          "field": "ingest.failure",
          "value": "{{_ingest.on_failure_message}}"
        }
      }
    ],
    "processors": [
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "name_embedding",
          "field_map": {
            "name": "text_field"
          }
        }
      },
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "description_embedding",
          "field_map": {
            "description": "text_field"
          }
        }
      },
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "custom_field1_embedding",
          "field_map": {
            "custom_field1": "text_field"
          }
        }
      }
    ]
  }
}
有什么线索吗?问题出在哪里?
编辑2 - GET _cat/tasks/?v&actions=*reindex&detailed 的输出:
action                     task_id                     parent_task_id type      start_time    timestamp running_time ip         node         description
indices:data/write/reindex zn7bFe98Qly-eEd_VrXAMQ:1250 -              transport 1681738821297 13:40:21  3.2m         172.19.0.2 4cdda1de1ed0 reindex from [source] to [destination]
编辑3 - GET _nodes/hot_threads 的输出:
::: {4cdda1de1ed0}{zn7bFe98Qly-eEd_VrXAMQ}{nlHIUMF4SnGX8h_vrEYVYA}{4cdda1de1ed0}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.7.0}{ml.max_jvm_size=4118806528, ml.allocated_processors=4, ml.machine_memory=8233017344, xpack.installed=true, ml.allocated_processors_double=4.0}
   Hot threads at 2023-04-17T14:08:51.922Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.6% [cpu=0.6%, idle=99.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[4cdda1de1ed0][transport_worker][T#4]'
     3/10 snapshots sharing following 3 elements
       io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
       io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
       java.base@19.0.2/java.lang.Thread.run(Thread.java:1589)
编辑4 - GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines 的输出:
{
  "nodes": {
    "zn7bFe98Qly-eEd_VrXAMQ": {
      "ingest": {
        "pipelines": {
          "recipe-name_description_custom_steps_custom_ingredients_custom_tags-field-embeddings-pipeline": {
            "count": 0,
            "time_in_millis": 0,
            "current": 0,
            "failed": 0,
            "processors": [
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              },
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              }
            ]
          },
          // 其他管道...
        }
      }
    }
  }
}
英文:
I use ES reindex to copy documents from the src index to the destination index. While doing that, I am running a few ingestion pipelines. The reindexing is so slow. Not even a single document was uploaded in 1 hour. I am running ES on my MacBook Pro 2022.
Here is the reindexing task details.
{
  "completed": false,
  "task": {
    "node": "zn7bFe98Qly-eEd_VrXAMQ",
    "id": 688,
    "type": "transport",
    "action": "indices:data/write/reindex",
    "status": {
      "total": 2461,
      "updated": 0,
      "created": 0,
      "deleted": 0,
      "batches": 1,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": 1,
      "throttled_until_millis": 0
    },
    "description": "reindex from [source_index] to [destination_index]",
    "start_time_in_millis": 1681734787348,
    "running_time_in_nanos": 1800085540069,
    "cancellable": true,
    "cancelled": false,
    "headers": {}
  }
}
You can see out of 2461; nothing was created.
I am using requests_per_second=1
And Ingest pipeline is
{
  "document-pipeline": {
    "description": "abc",
    "on_failure": [
      {
        "set": {
          "description": "Index document to 'failed-<index>'",
          "field": "_index",
          "value": "failed-{{{_index}}}"
        }
      },
      {
        "set": {
          "description": "Sazzad pycharm ingestion failed error message",
          "field": "ingest.failure",
          "value": "{{_ingest.on_failure_message}}"
        }
      }
    ],
    "processors": [
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "name_embedding",
          "field_map": {
            "name": "text_field"
          }
        }
      },
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "description_embedding",
          "field_map": {
            "description": "text_field"
          }
        }
      },
      {
        "inference": {
          "model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
          "target_field": "custom_field1_embedding",
          "field_map": {
            "custom_field1": "text_field"
          }
        }
      }
        }
      }
    ]
  }
}
Any clue? What is the problem under the hood?
Edit 2 - Output of GET _cat/tasks/?v&actions=*reindex&detailed
`action                     task_id                     parent_task_id type      start_time    timestamp running_time ip         node         description
indices:data/write/reindex zn7bFe98Qly-eEd_VrXAMQ:1250 -              transport 1681738821297 13:40:21  3.2m         172.19.0.2 4cdda1de1ed0 reindex from [source] to [destination]
Edit 3 - Output of GET _nodes/hot_threads
::: {4cdda1de1ed0}{zn7bFe98Qly-eEd_VrXAMQ}{nlHIUMF4SnGX8h_vrEYVYA}{4cdda1de1ed0}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.7.0}{ml.max_jvm_size=4118806528, ml.allocated_processors=4, ml.machine_memory=8233017344, xpack.installed=true, ml.allocated_processors_double=4.0}
   Hot threads at 2023-04-17T14:08:51.922Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.6% [cpu=0.6%, idle=99.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[4cdda1de1ed0][transport_worker][T#4]'
     3/10 snapshots sharing following 3 elements
       io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
       io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
       java.base@19.0.2/java.lang.Thread.run(Thread.java:1589)
Edit 4: output of GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines
{
"nodes": {
"zn7bFe98Qly-eEd_VrXAMQ": {
"ingest": {
"pipelines": {
"recipe-name_description_custom_steps_custom_ingredients_custom_tags-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-custom_steps-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-description-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe--field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 1000,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 1000,
"time_in_millis": 288844826,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 471,
"time_in_millis": 645430266,
"current": 529,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 471,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-name-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-custom_ingredients-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-browser.screenshot-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-browser.screenshot@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
".fleet_final_pipeline-1": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"date:truncate-subseconds-event-ingested": {
"type": "date",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"remove": {
"type": "remove",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"set_security_user": {
"type": "set_security_user",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"script:agent-id-status": {
"type": "script",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"remove": {
"type": "remove",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-icmp-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-icmp@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-http-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-http@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-name_description_custom_steps_custom_ingredients-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-browser.network-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-browser.network@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-browser-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-browser@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-tcp-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-tcp@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
}
}
}
}
}
}
答案1
得分: 1
如果你查看 recipe--field-embeddings-pipeline,你会发现管道中有1000个文档:
- 所有1000个文档都经历了第一个推断处理器
 - 529个文档当前正在通过第二个推断处理器处理
 - 471个文档当前正在通过第三个推断处理器处理
 
这意味着它正在工作,但需要时间,可能是因为 requests_per_second=1 使其变得非常慢。
所以没有问题,只是慢。你可以通过 GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines 进行调整并查看进度,看看是否可以加快速度。
英文:
If you look at recipe--field-embeddings-pipeline you can see that there are 1000 documents in the pipeline:
- all 1000 have gone through the first inference processor
 - 529 are currently going through the second inference processor
 - 471 are currently going through the third inference processor
 
This means that it is working, but it takes time, probably because requests_per_second=1 makes it really slow.
So nothing wrong, just slow. You can tune and follow the progress via GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines to see if you can get it to go faster.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论