如何使用Grok解析没有顺序和结构的应用程序日志

huangapple go评论49阅读模式
英文:

How to parse application log without order and structure using Grok

问题

以下是您要翻译的内容:

我正在使用Grok解析应用程序日志,使用https://grokconstructor.appspot.com/do/match 进行测试。

日志如下所示:

2023-04-01 02:00:00,007 [nioEventLoopGroup-13-13] INFO {"deviceid":"aaaaaaaaaa","userAgent":"device"}
2023-04-01 02:00:01,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device","deviceid":"bbbbbbbbbb"}
2023-04-01 02:00:02,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device"}


我的Grok模式:

%{GENERATE_TIME:generateTime}.?%{DEVICEID:deviceId}.?%{AGENT:userAgent}


自定义模式:

GENERATE_TIME \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}
DEVICEID deviceid":"(.{10})
AGENT "userAgent":"(.*?)"


输出:
[output](https://i.stack.imgur.com/HOkZf.png)

期望输出:

[
{
"generateTime": "2023-04-01·02:00:00,007",
"deviceId": "aaaaaaaaaa",
"userAgent": "device"
},
{
"generateTime": "2023-04-01·02:00:01,234",
"deviceId": "bbbbbbbbbb",
"userAgent": "device"
},
{
"generateTime": "2023-04-01·02:00:02,234",
"userAgent": "device"
}
]


看起来有两个问题需要解决,如何干净地匹配`deviceId`和`userAgent`,以及如何无序解析日志。

提前致谢。
英文:

I'm parsing application log using Grok, testing with https://grokconstructor.appspot.com/do/match.

The log presents like below:

2023-04-01 02:00:00,007 [nioEventLoopGroup-13-13] INFO {"deviceid":"aaaaaaaaaa","userAgent":"device"}
2023-04-01 02:00:01,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device","deviceid":"bbbbbbbbbb"}
2023-04-01 02:00:02,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device"}

My Grok pattern:

%{GENERATE_TIME:generateTime}.*?%{DEVICEID:deviceId}.*?%{AGENT:userAgent}

Custom pattern:

GENERATE_TIME \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}
DEVICEID deviceid":"(.{10})
AGENT "userAgent":"(.*?)"

Output:
output

Expected Output:

[
    {
        "generateTime": "2023-04-01·02:00:00,007",
        "deviceId": "aaaaaaaaaa",
        "userAgent": "device"
    },
    {
        "generateTime": "2023-04-01·02:00:01,234",
        "deviceId": "bbbbbbbbbb",
        "userAgent": "device"
    },
    {
        "generateTime": "2023-04-01·02:00:02,234",
        "userAgent": "device"
    }
]

It seems that there are two problems to solve, how to match deviceId and userAgent cleanly, how to parse log without order.

Thanks in advance.

答案1

得分: 0

Sure, here is the translated code part:

filter
{
  grok
  {
    match =>
    {
      "message" => ['%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{LOGLEVEL:loglevel} %{DATA:deviceid}:%{DATA:id},%{DATA:useragent}:"%{DATA:agentname}"', '%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{DATA:loglevel} %{DATA:useragent}:"%{DATA:agentname}"']
    }
  }
}
英文:

Could you try the below grok pattern and feedback?

filter
{
grok
{
match => 
{
"message" => ['%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{LOGLEVEL:loglevel} %{DATA:deviceid}:%{DATA:id},%{DATA:useragent}:"%{DATA:agentname}"
', '%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{DATA:loglevel} %{DATA:useragent}:"%{DATA:agentname}"']
}
}
}

答案2

得分: 0

我认为你可以使用以下的 grok 模式:

  match => { "message" => "%{TIMESTAMP_ISO8601:generateTime} \[%{DATA:thread}\] %{LOGLEVEL:loglevel} %{GREEDYDATA:json_data}" }
}

这应该匹配上面的所有3行。然后,为了获得你指定的输出,在 grok 之后,你可以使用:

  source => "json_data"
  remove_field => ["json_data"]
}

json 过滤器将会解析 json_data 成为单独的字段。

英文:

I think you can use the below grok pattern:

grok {
  match => { "message" => "%{TIMESTAMP_ISO8601:generateTime} \[%{DATA:thread}\] %{LOGLEVEL:loglevel} %{GREEDYDATA:json_data}" }
}

This should match all 3 lines above. And in order to get the output you specified, just after grok, you can use

json {
  source => "json_data"
  remove_field => ["json_data"]
}

The json filter will parse the json_data into individual fields.

huangapple
  • 本文由 发表于 2023年4月4日 14:14:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75926033.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定