英文:
How to parse application log without order and structure using Grok
问题
以下是您要翻译的内容:
我正在使用Grok解析应用程序日志,使用https://grokconstructor.appspot.com/do/match 进行测试。
日志如下所示:
2023-04-01 02:00:00,007 [nioEventLoopGroup-13-13] INFO {"deviceid":"aaaaaaaaaa","userAgent":"device"}
2023-04-01 02:00:01,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device","deviceid":"bbbbbbbbbb"}
2023-04-01 02:00:02,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device"}
我的Grok模式:
%{GENERATE_TIME:generateTime}.?%{DEVICEID:deviceId}.?%{AGENT:userAgent}
自定义模式:
GENERATE_TIME \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}
DEVICEID deviceid":"(.{10})
AGENT "userAgent":"(.*?)"
输出:
[output](https://i.stack.imgur.com/HOkZf.png)
期望输出:
[
{
"generateTime": "2023-04-01·02:00:00,007",
"deviceId": "aaaaaaaaaa",
"userAgent": "device"
},
{
"generateTime": "2023-04-01·02:00:01,234",
"deviceId": "bbbbbbbbbb",
"userAgent": "device"
},
{
"generateTime": "2023-04-01·02:00:02,234",
"userAgent": "device"
}
]
看起来有两个问题需要解决,如何干净地匹配`deviceId`和`userAgent`,以及如何无序解析日志。
提前致谢。
英文:
I'm parsing application log using Grok, testing with https://grokconstructor.appspot.com/do/match.
The log presents like below:
2023-04-01 02:00:00,007 [nioEventLoopGroup-13-13] INFO {"deviceid":"aaaaaaaaaa","userAgent":"device"}
2023-04-01 02:00:01,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device","deviceid":"bbbbbbbbbb"}
2023-04-01 02:00:02,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device"}
My Grok pattern:
%{GENERATE_TIME:generateTime}.*?%{DEVICEID:deviceId}.*?%{AGENT:userAgent}
Custom pattern:
GENERATE_TIME \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}
DEVICEID deviceid":"(.{10})
AGENT "userAgent":"(.*?)"
Output:
output
Expected Output:
[
{
"generateTime": "2023-04-01·02:00:00,007",
"deviceId": "aaaaaaaaaa",
"userAgent": "device"
},
{
"generateTime": "2023-04-01·02:00:01,234",
"deviceId": "bbbbbbbbbb",
"userAgent": "device"
},
{
"generateTime": "2023-04-01·02:00:02,234",
"userAgent": "device"
}
]
It seems that there are two problems to solve, how to match deviceId
and userAgent
cleanly, how to parse log without order.
Thanks in advance.
答案1
得分: 0
Sure, here is the translated code part:
filter
{
grok
{
match =>
{
"message" => ['%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{LOGLEVEL:loglevel} %{DATA:deviceid}:%{DATA:id},%{DATA:useragent}:"%{DATA:agentname}"', '%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{DATA:loglevel} %{DATA:useragent}:"%{DATA:agentname}"']
}
}
}
英文:
Could you try the below grok pattern and feedback?
filter
{
grok
{
match =>
{
"message" => ['%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{LOGLEVEL:loglevel} %{DATA:deviceid}:%{DATA:id},%{DATA:useragent}:"%{DATA:agentname}"
', '%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{DATA:loglevel} %{DATA:useragent}:"%{DATA:agentname}"']
}
}
}
答案2
得分: 0
我认为你可以使用以下的 grok
模式:
match => { "message" => "%{TIMESTAMP_ISO8601:generateTime} \[%{DATA:thread}\] %{LOGLEVEL:loglevel} %{GREEDYDATA:json_data}" }
}
这应该匹配上面的所有3行。然后,为了获得你指定的输出,在 grok
之后,你可以使用:
source => "json_data"
remove_field => ["json_data"]
}
json
过滤器将会解析 json_data
成为单独的字段。
英文:
I think you can use the below grok
pattern:
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:generateTime} \[%{DATA:thread}\] %{LOGLEVEL:loglevel} %{GREEDYDATA:json_data}" }
}
This should match all 3 lines above. And in order to get the output you specified, just after grok
, you can use
json {
source => "json_data"
remove_field => ["json_data"]
}
The json
filter will parse the json_data
into individual fields.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论