2023年4月4日 14:14:19go评论130阅读模式

英文:

How to parse application log without order and structure using Grok

问题

以下是您要翻译的内容：

我正在使用Grok解析应用程序日志，使用https://grokconstructor.appspot.com/do/match 进行测试。

日志如下所示：

2023-04-01 02:00:00,007 [nioEventLoopGroup-13-13] INFO {"deviceid":"aaaaaaaaaa","userAgent":"device"}
2023-04-01 02:00:01,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device","deviceid":"bbbbbbbbbb"}
2023-04-01 02:00:02,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device"}


我的Grok模式：

%{GENERATE_TIME:generateTime}.?%{DEVICEID:deviceId}.?%{AGENT:userAgent}


自定义模式：

GENERATE_TIME \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}
DEVICEID deviceid":"(.{10})
AGENT "userAgent":"(.*?)"


输出：
[output](https://i.stack.imgur.com/HOkZf.png)

期望输出：

[
{
"generateTime": "2023-04-01·02:00:00,007",
"deviceId": "aaaaaaaaaa",
"userAgent": "device"
},
{
"generateTime": "2023-04-01·02:00:01,234",
"deviceId": "bbbbbbbbbb",
"userAgent": "device"
},
{
"generateTime": "2023-04-01·02:00:02,234",
"userAgent": "device"
}
]


看起来有两个问题需要解决，如何干净地匹配`deviceId`和`userAgent`，以及如何无序解析日志。

提前致谢。

英文:

I'm parsing application log using Grok, testing with https://grokconstructor.appspot.com/do/match.

The log presents like below:

2023-04-01 02:00:00,007 [nioEventLoopGroup-13-13] INFO {&quot;deviceid&quot;:&quot;aaaaaaaaaa&quot;,&quot;userAgent&quot;:&quot;device&quot;}
2023-04-01 02:00:01,234 [nioEventLoopGroup-13-13] INFO {&quot;userAgent&quot;:&quot;device&quot;,&quot;deviceid&quot;:&quot;bbbbbbbbbb&quot;}
2023-04-01 02:00:02,234 [nioEventLoopGroup-13-13] INFO {&quot;userAgent&quot;:&quot;device&quot;}

My Grok pattern:

%{GENERATE_TIME:generateTime}.*?%{DEVICEID:deviceId}.*?%{AGENT:userAgent}

Custom pattern:

GENERATE_TIME \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}
DEVICEID deviceid&quot;:&quot;(.{10})
AGENT &quot;userAgent&quot;:&quot;(.*?)&quot;

Output:
output

Expected Output:

[
    {
        &quot;generateTime&quot;: &quot;2023-04-01&#183;02:00:00,007&quot;,
        &quot;deviceId&quot;: &quot;aaaaaaaaaa&quot;,
        &quot;userAgent&quot;: &quot;device&quot;
    },
    {
        &quot;generateTime&quot;: &quot;2023-04-01&#183;02:00:01,234&quot;,
        &quot;deviceId&quot;: &quot;bbbbbbbbbb&quot;,
        &quot;userAgent&quot;: &quot;device&quot;
    },
    {
        &quot;generateTime&quot;: &quot;2023-04-01&#183;02:00:02,234&quot;,
        &quot;userAgent&quot;: &quot;device&quot;
    }
]

It seems that there are two problems to solve, how to match deviceId and userAgent cleanly, how to parse log without order.

Thanks in advance.

答案1

得分: 0

Sure, here is the translated code part:

filter
{
  grok
  {
    match =>
    {
      "message" => ['%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{LOGLEVEL:loglevel} %{DATA:deviceid}:%{DATA:id},%{DATA:useragent}:"%{DATA:agentname}"', '%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{DATA:loglevel} %{DATA:useragent}:"%{DATA:agentname}"']
    }
  }
}

英文:

Could you try the below grok pattern and feedback?

filter
{
grok
{
match =&gt; 
{
&quot;message&quot; =&gt; [&#39;%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{LOGLEVEL:loglevel} %{DATA:deviceid}:%{DATA:id},%{DATA:useragent}:&quot;%{DATA:agentname}&quot;
&#39;, &#39;%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{DATA:loglevel} %{DATA:useragent}:&quot;%{DATA:agentname}&quot;&#39;]
}
}
}

答案2

得分: 0

我认为你可以使用以下的 grok 模式：

  match => { "message" => "%{TIMESTAMP_ISO8601:generateTime} \[%{DATA:thread}\] %{LOGLEVEL:loglevel} %{GREEDYDATA:json_data}" }
}

这应该匹配上面的所有3行。然后，为了获得你指定的输出，在 grok 之后，你可以使用：

  source => "json_data"
  remove_field => ["json_data"]
}

json 过滤器将会解析 json_data 成为单独的字段。

英文:

I think you can use the below grok pattern:

grok {
  match =&gt; { &quot;message&quot; =&gt; &quot;%{TIMESTAMP_ISO8601:generateTime} \[%{DATA:thread}\] %{LOGLEVEL:loglevel} %{GREEDYDATA:json_data}&quot; }
}

This should match all 3 lines above. And in order to get the output you specified, just after grok, you can use

json {
  source =&gt; &quot;json_data&quot;
  remove_field =&gt; [&quot;json_data&quot;]
}

The json filter will parse the json_data into individual fields.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Grok解析没有顺序和结构的应用程序日志

问题

答案1

答案2

Glue PySpark kernel not showing in VS Code

‘JavaPackage’对象在AWS Glue上不可调用。

迁移 DDB 表

AWS VPC用于Glue Job访问SOAP API。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论