如何通过KQL从动态列中删除重复项

huangapple go评论134阅读模式
英文:

How to remove duplicates from dynamic column through KQL

问题

我们正在尝试在Azure云中通过kQL查询设置应用程序(容器)日志的警报。我们正在运行以下查询以跟踪错误并删除重复项。

let ContainerIdList = KubePodInventory
| where ContainerName contains "custom-app-logger"
| where Namespace has "devns"
| where ClusterId =~ '/subscriptions/12345-xyzl-4e12-bc3a-5c7859365342636/resourcegroups/rg-aks-dev-dev/providers/Microsoft.ContainerService/managedClusters/aksdevcluster'
| distinct ContainerID;

ContainerLog 
| where ContainerID in (ContainerIdList)
| where LogEntry !has "SRV1174"
| where LogEntry has "| E |" or LogEntry has "| F |"
| where LogEntry !contains "the I/O interface definition of project"
| where LogEntry !contains "the I/O interface definition of cuc"
| project LogEntrySource, LogEntry, TimeGenerated 
| order by TimeGenerated desc
| top 1000 by LogEntry
| render table
| distinct LogEntry

然而,它没有删除重复项。如果您查看下面的日志输出,除了第一个和最后一个之外,其他所有日志都是相同的,只是其中的时间戳不同。请查看下面的日志输出截图。

[![enter image description here][1]][1]

您能帮忙告诉我们如何删除实际错误的重复项吗?如果是相同错误但时间戳不同,需要选择最新的错误。

英文:

we are trying to setup alerts for application(container) logs through kQL Queries in Azure cloud. we are running following query to track error and remove duplicates.

    let ContainerIdList = KubePodInventory
    | where ContainerName contains "custom-app-logger"
    | where Namespace has "devns"
    | where ClusterId =~ '/subscriptions/12345-xyzl-4e12-bc3a-5c7859365342636/resourcegroups/rg-aks-dev-dev/providers/Microsoft.ContainerService/managedClusters/aksdevcluster'
    | distinct ContainerID;
     ContainerLog 
    | where ContainerID in (ContainerIdList)
    | where LogEntry !has "SRV1174"
    | where LogEntry has "| E |" or LogEntry has "| F |"
    | where LogEntry !contains "the I/O interface definition of project" 
    | where LogEntry !contains "the I/O interface definition of cuc" 
    | project LogEntrySource, LogEntry, TimeGenerated 
    | order by TimeGenerated desc
    | top 1000 by LogEntry
    | render table
    | distinct LogEntry

however its not removing the duplicates. if you see the log output below except first and last one, all others are same only with different timestamp in it. please find the log output screenshot below.

[![enter image description here][1]][1]

could you please help me how can we remove duplicates with actual error, if it is same error with different time stamp in it, need to pick the latest one.

答案1

得分: 0

我在我的环境中复现并获得了如下所示的预期结果:

首先,取一个简单的表格如下:

LogEntry

0|2023-07-23 11:40:50|b|abc
0|2023-07-23 11:42:50|b|abc
0|2023-07-23 12:40:50|c|xyz
0|2023-07-23 14:40:50|d|bnm

然后,我使用以下查询来去除重复项:

let x = datatable(LogEntry: string)
[
    "0|2023-07-23 11:40:50|b|abc",
    "0|2023-07-23 11:42:50|b|abc",
    "0|2023-07-23 12:40:50|c|xyz",
    "0|2023-07-23 14:40:50|d|bnm"
]
| extend SplitLog = split(LogEntry, "|")
| project C1 = SplitLog[0], cc=SplitLog[1],C2 = todatetime(SplitLog[1]), C3 = SplitLog[2], UV = SplitLog[3];
x
| summarize arg_max(C2,*) by tostring(UV)
| project-away UV
| project Log = strcat(tostring(C1), "|", cc, "|", C3, "|", UV1)

输出:

如何通过KQL从动态列中删除重复项

Fiddle

LogEntry列以管道字符(|)分割并创建新列。然后,通过UniqueValues(UV)对其进行汇总,并取Timestamp(C2)的最大值,然后移除Unique Value列(这将得到一个额外的列),然后使用管道符号(|)将其拼接回去,就像我所做的那样。请尝试按照上面的示例来获取所需的结果。

编辑:

ContainerIdList
| extend SplitLog = split(LogEntry, "|")
| project C1 = SplitLog[0], cc=SplitLog[1],C2 = todatetime(SplitLog[1]), C3 = SplitLog[2], UV = SplitLog[3]
| summarize arg_max(C2,*) by tostring(UV)
| project-away UV
| project Log = strcat(tostring(C1), "|", cc, "|", C3, "|", UV1)
英文:

I have reproduced in my environment and got expected results as below:

Firstly, taken a simple table like below:

LogEntry

0|2023-07-23 11:40:50|b|abc
0|2023-07-23 11:42:50|b|abc
0|2023-07-23 12:40:50|c|xyz
0|2023-07-23 14:40:50|d|bnm

如何通过KQL从动态列中删除重复项

Then I have used below query to remove duplicates:

let x = datatable(LogEntry: string)
[
    "0|2023-07-23 11:40:50|b|abc",
    "0|2023-07-23 11:42:50|b|abc",
    "0|2023-07-23 12:40:50|c|xyz",
    "0|2023-07-23 14:40:50|d|bnm"
]
| extend SplitLog = split(LogEntry, "|")
| project C1 = SplitLog[0], cc=SplitLog[1],C2 = todatetime(SplitLog[1]), C3 = SplitLog[2], UV = SplitLog[3];
x
| summarize arg_max(C2,*) by tostring(UV)
| project-away UV
| project Log = strcat(tostring(C1), "|", cc, "|", C3, "|", UV1)

如何通过KQL从动态列中删除重复项

Output:

如何通过KQL从动态列中删除重复项

Fiddle.

Split the LogEntry column by pipes(|) and create new columns. Then summarize it by UniqueValues(UV) and take the maximum of Timestamp(C2) and remove Unique Value column(which you will get extra column) then concat it back with Pipe symbol like I have done. Try to follow above example to get desired result as I have got.

EDIT:

ContainerIdList
    | extend SplitLog = split(LogEntry, "|")
    | project C1 = SplitLog[0], cc=SplitLog[1],C2 = todatetime(SplitLog[1]), C3 = SplitLog[2], UV = SplitLog[3]
    | summarize arg_max(C2,*) by tostring(UV)
    | project-away UV
    | project Log = strcat(tostring(C1), "|", cc, "|", C3, "|", UV1

huangapple
  • 本文由 发表于 2023年7月3日 18:41:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603972.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定