英文:
Elastic Search Reindexing does not progress
问题
你可以看到,在2461个文档中,没有一个被创建。
我使用requests_per_second=1
。
而且Ingest pipeline是:
{
"document-pipeline": {
"description": "abc",
"on_failure": [
{
"set": {
"description": "Index document to 'failed-<index>'",
"field": "_index",
"value": "failed-{{{_index}}}"
}
},
{
"set": {
"description": "Sazzad pycharm ingestion failed error message",
"field": "ingest.failure",
"value": "{{_ingest.on_failure_message}}"
}
}
],
"processors": [
{
"inference": {
"model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
"target_field": "name_embedding",
"field_map": {
"name": "text_field"
}
}
},
{
"inference": {
"model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
"target_field": "description_embedding",
"field_map": {
"description": "text_field"
}
}
},
{
"inference": {
"model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
"target_field": "custom_field1_embedding",
"field_map": {
"custom_field1": "text_field"
}
}
}
]
}
}
有什么线索吗?问题出在哪里?
编辑2 - GET _cat/tasks/?v&actions=*reindex&detailed
的输出:
action task_id parent_task_id type start_time timestamp running_time ip node description
indices:data/write/reindex zn7bFe98Qly-eEd_VrXAMQ:1250 - transport 1681738821297 13:40:21 3.2m 172.19.0.2 4cdda1de1ed0 reindex from [source] to [destination]
编辑3 - GET _nodes/hot_threads
的输出:
::: {4cdda1de1ed0}{zn7bFe98Qly-eEd_VrXAMQ}{nlHIUMF4SnGX8h_vrEYVYA}{4cdda1de1ed0}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.7.0}{ml.max_jvm_size=4118806528, ml.allocated_processors=4, ml.machine_memory=8233017344, xpack.installed=true, ml.allocated_processors_double=4.0}
Hot threads at 2023-04-17T14:08:51.922Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
0.6% [cpu=0.6%, idle=99.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[4cdda1de1ed0][transport_worker][T#4]'
3/10 snapshots sharing following 3 elements
io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.base@19.0.2/java.lang.Thread.run(Thread.java:1589)
编辑4 - GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines
的输出:
{
"nodes": {
"zn7bFe98Qly-eEd_VrXAMQ": {
"ingest": {
"pipelines": {
"recipe-name_description_custom_steps_custom_ingredients_custom_tags-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
// 其他管道...
}
}
}
}
}
英文:
I use ES reindex to copy documents from the src index to the destination index. While doing that, I am running a few ingestion pipelines. The reindexing is so slow. Not even a single document was uploaded in 1 hour. I am running ES on my MacBook Pro 2022.
Here is the reindexing task details.
{
"completed": false,
"task": {
"node": "zn7bFe98Qly-eEd_VrXAMQ",
"id": 688,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 2461,
"updated": 0,
"created": 0,
"deleted": 0,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": 1,
"throttled_until_millis": 0
},
"description": "reindex from [source_index] to [destination_index]",
"start_time_in_millis": 1681734787348,
"running_time_in_nanos": 1800085540069,
"cancellable": true,
"cancelled": false,
"headers": {}
}
}
You can see out of 2461; nothing was created.
I am using requests_per_second=1
And Ingest pipeline is
{
"document-pipeline": {
"description": "abc",
"on_failure": [
{
"set": {
"description": "Index document to 'failed-<index>'",
"field": "_index",
"value": "failed-{{{_index}}}"
}
},
{
"set": {
"description": "Sazzad pycharm ingestion failed error message",
"field": "ingest.failure",
"value": "{{_ingest.on_failure_message}}"
}
}
],
"processors": [
{
"inference": {
"model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
"target_field": "name_embedding",
"field_map": {
"name": "text_field"
}
}
},
{
"inference": {
"model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
"target_field": "description_embedding",
"field_map": {
"description": "text_field"
}
}
},
{
"inference": {
"model_id": "sentence-transformers__paraphrase-mpnet-base-v2",
"target_field": "custom_field1_embedding",
"field_map": {
"custom_field1": "text_field"
}
}
}
}
}
]
}
}
Any clue? What is the problem under the hood?
Edit 2 - Output of GET _cat/tasks/?v&actions=*reindex&detailed
`action task_id parent_task_id type start_time timestamp running_time ip node description
indices:data/write/reindex zn7bFe98Qly-eEd_VrXAMQ:1250 - transport 1681738821297 13:40:21 3.2m 172.19.0.2 4cdda1de1ed0 reindex from [source] to [destination]
Edit 3 - Output of GET _nodes/hot_threads
::: {4cdda1de1ed0}{zn7bFe98Qly-eEd_VrXAMQ}{nlHIUMF4SnGX8h_vrEYVYA}{4cdda1de1ed0}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.7.0}{ml.max_jvm_size=4118806528, ml.allocated_processors=4, ml.machine_memory=8233017344, xpack.installed=true, ml.allocated_processors_double=4.0}
Hot threads at 2023-04-17T14:08:51.922Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
0.6% [cpu=0.6%, idle=99.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[4cdda1de1ed0][transport_worker][T#4]'
3/10 snapshots sharing following 3 elements
io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.base@19.0.2/java.lang.Thread.run(Thread.java:1589)
Edit 4: output of GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines
{
"nodes": {
"zn7bFe98Qly-eEd_VrXAMQ": {
"ingest": {
"pipelines": {
"recipe-name_description_custom_steps_custom_ingredients_custom_tags-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-custom_steps-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-description-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe--field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 1000,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 1000,
"time_in_millis": 288844826,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 471,
"time_in_millis": 645430266,
"current": 529,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 471,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-name-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-custom_ingredients-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-browser.screenshot-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-browser.screenshot@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
".fleet_final_pipeline-1": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"date:truncate-subseconds-event-ingested": {
"type": "date",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"remove": {
"type": "remove",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"set_security_user": {
"type": "set_security_user",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"script:agent-id-status": {
"type": "script",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"remove": {
"type": "remove",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-icmp-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-icmp@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-http-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-http@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"recipe-name_description_custom_steps_custom_ingredients-field-embeddings-pipeline": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
},
{
"inference": {
"type": "inference",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-browser.network-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-browser.network@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-browser-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-browser@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
},
"synthetics-tcp-0.11.8": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0,
"processors": [
{
"pipeline:synthetics-tcp@custom": {
"type": "pipeline",
"stats": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
}
}
}
]
}
}
}
}
}
}
答案1
得分: 1
如果你查看 recipe--field-embeddings-pipeline
,你会发现管道中有1000个文档:
- 所有1000个文档都经历了第一个推断处理器
- 529个文档当前正在通过第二个推断处理器处理
- 471个文档当前正在通过第三个推断处理器处理
这意味着它正在工作,但需要时间,可能是因为 requests_per_second=1
使其变得非常慢。
所以没有问题,只是慢。你可以通过 GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines
进行调整并查看进度,看看是否可以加快速度。
英文:
If you look at recipe--field-embeddings-pipeline
you can see that there are 1000 documents in the pipeline:
- all 1000 have gone through the first inference processor
- 529 are currently going through the second inference processor
- 471 are currently going through the third inference processor
This means that it is working, but it takes time, probably because requests_per_second=1
makes it really slow.
So nothing wrong, just slow. You can tune and follow the progress via GET /_nodes/stats?pretty&filter_path=**.ingest.pipelines
to see if you can get it to go faster.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论