英文:
Duplication in logstash pipeline (input elasticsearch and output sql database)
问题
I am using elasicsearch index as my input in my logstash config and the output is jdbc-output plugin logstash that send logs to sql database table columns , and the problem is I have duplication in sql database , I used uuid filter plugin logtash but nothing changed.what is the reason and what solution do you suggest?
here is my config :
input{
elasticsearch {
hosts => "ip:9200"
index => "indexname"
user => "user"
password => "elastic"
query => '{ "query": { "query_string": { "query": "" } } }'
schedule => "/5 * * * *" #Specifies how often the query should be executed. In this case, it's set to run every 5 minutes
size => 1500 #Specifies the maximum number of documents to retrieve per query
scroll => "5m" #Specifies how long Elasticsearch should keep the search context open for the query. In this case, it's set to 5 minutes
docinfo => true
}
}
filter {
uuid {
target => "document_id"
overwrite => true
}
}
output {
if "API_REQUEST" in [message] {
jdbc {
driver_jar_path => '/usr/share/logstash/vendor/jar/jdbc/mssql-jdbc-12.2.0.jre8.jar'
connection_string => "jdbc:sqlserver://ip:1433;databaseName=izdb;user=user;password=pass;ssl=false;trustServerCertificate=true"
enable_event_as_json_keyword => true
statement => [
"INSERT INTO Transaction (document_id, logLevel, timestamp) VALUES (?,?,?)",
"document_id",
"logLevel",
"timestamp"
]
}
}
}
英文:
I am using elasicsearch index as my input in my logstash config and the output is jdbc-output plugin logstash that send logs to sql database table columns , and the problem is I have duplication in sql database , I used uuid filter plugin logtash but nothing changed.what is the reason and what solution do you suggest?
here is my config :
input{
elasticsearch {
hosts => "ip:9200"
index => "indexname"
user => "user"
password => "elastic"
query => '{ "query": { "query_string": { "query": "*" } } }'
schedule => "*/5 * * * *" #Specifies how often the query should be executed. In this case, it's set to run every 5 minutes
size => 1500 #Specifies the maximum number of documents to retrieve per query
scroll => "5m" #Specifies how long Elasticsearch should keep the search context open for the query. In this case, it's set to 5 minutes
docinfo => true
}
}
filter {
uuid {
target => "document_id"
overwrite => true
}
}
output {
if "API_REQUEST" in [message] {
jdbc {
driver_jar_path => '/usr/share/logstash/vendor/jar/jdbc/mssql-jdbc-12.2.0.jre8.jar'
connection_string => "jdbc:sqlserver://ip:1433;databaseName=izdb;user=user;password=pass;ssl=false;trustServerCertificate=true"
enable_event_as_json_keyword => true
statement => [
"INSERT INTO Transaction (document_id, logLevel, timestamp) VALUES (?,?,?)",
"document_id",
"logLevel",
"timestamp"
]
}
}
}
}
答案1
得分: 1
我分享了解决/检测问题的几种方法。
-
在输入部分添加
docinfo_target => "[@metadata][doc]"
并重新启动 Logstash 可能有所帮助。 -
更新过滤器 UUID 并使用
%{[@metadata][doc][_id]}
而不是 document_id。 -
在 Logstash 输出中添加
stdout{}
并观察输出以找到根本原因。
详细信息请参考:https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
英文:
I'm sharing a couple of ways to solve/detect the problem.
-
add
docinfo_target => "[@metadata][doc]"
to the input part and restarting the logstash again can help. -
Update the filter uuid and use
%{[@metadata][doc][_id]}
rather than document_id. -
add
stdout{}
to the logstash output and observe the output to find the root cause.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论