英文:
How to remove duplicates from dynamic column through KQL
问题
我们正在尝试在Azure云中通过kQL查询设置应用程序(容器)日志的警报。我们正在运行以下查询以跟踪错误并删除重复项。
let ContainerIdList = KubePodInventory
| where ContainerName contains "custom-app-logger"
| where Namespace has "devns"
| where ClusterId =~ '/subscriptions/12345-xyzl-4e12-bc3a-5c7859365342636/resourcegroups/rg-aks-dev-dev/providers/Microsoft.ContainerService/managedClusters/aksdevcluster'
| distinct ContainerID;
ContainerLog
| where ContainerID in (ContainerIdList)
| where LogEntry !has "SRV1174"
| where LogEntry has "| E |" or LogEntry has "| F |"
| where LogEntry !contains "the I/O interface definition of project"
| where LogEntry !contains "the I/O interface definition of cuc"
| project LogEntrySource, LogEntry, TimeGenerated
| order by TimeGenerated desc
| top 1000 by LogEntry
| render table
| distinct LogEntry
然而,它没有删除重复项。如果您查看下面的日志输出,除了第一个和最后一个之外,其他所有日志都是相同的,只是其中的时间戳不同。请查看下面的日志输出截图。
[![enter image description here][1]][1]
您能帮忙告诉我们如何删除实际错误的重复项吗?如果是相同错误但时间戳不同,需要选择最新的错误。
英文:
we are trying to setup alerts for application(container) logs through kQL Queries in Azure cloud. we are running following query to track error and remove duplicates.
let ContainerIdList = KubePodInventory
| where ContainerName contains "custom-app-logger"
| where Namespace has "devns"
| where ClusterId =~ '/subscriptions/12345-xyzl-4e12-bc3a-5c7859365342636/resourcegroups/rg-aks-dev-dev/providers/Microsoft.ContainerService/managedClusters/aksdevcluster'
| distinct ContainerID;
ContainerLog
| where ContainerID in (ContainerIdList)
| where LogEntry !has "SRV1174"
| where LogEntry has "| E |" or LogEntry has "| F |"
| where LogEntry !contains "the I/O interface definition of project"
| where LogEntry !contains "the I/O interface definition of cuc"
| project LogEntrySource, LogEntry, TimeGenerated
| order by TimeGenerated desc
| top 1000 by LogEntry
| render table
| distinct LogEntry
however its not removing the duplicates. if you see the log output below except first and last one, all others are same only with different timestamp in it. please find the log output screenshot below.
[![enter image description here][1]][1]
could you please help me how can we remove duplicates with actual error, if it is same error with different time stamp in it, need to pick the latest one.
答案1
得分: 0
我在我的环境中复现并获得了如下所示的预期结果:
首先,取一个简单的表格如下:
LogEntry
0|2023-07-23 11:40:50|b|abc
0|2023-07-23 11:42:50|b|abc
0|2023-07-23 12:40:50|c|xyz
0|2023-07-23 14:40:50|d|bnm
然后,我使用以下查询来去除重复项:
let x = datatable(LogEntry: string)
[
"0|2023-07-23 11:40:50|b|abc",
"0|2023-07-23 11:42:50|b|abc",
"0|2023-07-23 12:40:50|c|xyz",
"0|2023-07-23 14:40:50|d|bnm"
]
| extend SplitLog = split(LogEntry, "|")
| project C1 = SplitLog[0], cc=SplitLog[1],C2 = todatetime(SplitLog[1]), C3 = SplitLog[2], UV = SplitLog[3];
x
| summarize arg_max(C2,*) by tostring(UV)
| project-away UV
| project Log = strcat(tostring(C1), "|", cc, "|", C3, "|", UV1)
输出:
将LogEntry
列以管道字符(|)分割并创建新列。然后,通过UniqueValues(UV)对其进行汇总,并取Timestamp(C2)的最大值,然后移除Unique Value列(这将得到一个额外的列),然后使用管道符号(|)将其拼接回去,就像我所做的那样。请尝试按照上面的示例来获取所需的结果。
编辑:
ContainerIdList
| extend SplitLog = split(LogEntry, "|")
| project C1 = SplitLog[0], cc=SplitLog[1],C2 = todatetime(SplitLog[1]), C3 = SplitLog[2], UV = SplitLog[3]
| summarize arg_max(C2,*) by tostring(UV)
| project-away UV
| project Log = strcat(tostring(C1), "|", cc, "|", C3, "|", UV1)
英文:
I have reproduced in my environment and got expected results as below:
Firstly, taken a simple table like below:
LogEntry
0|2023-07-23 11:40:50|b|abc
0|2023-07-23 11:42:50|b|abc
0|2023-07-23 12:40:50|c|xyz
0|2023-07-23 14:40:50|d|bnm
Then I have used below query to remove duplicates:
let x = datatable(LogEntry: string)
[
"0|2023-07-23 11:40:50|b|abc",
"0|2023-07-23 11:42:50|b|abc",
"0|2023-07-23 12:40:50|c|xyz",
"0|2023-07-23 14:40:50|d|bnm"
]
| extend SplitLog = split(LogEntry, "|")
| project C1 = SplitLog[0], cc=SplitLog[1],C2 = todatetime(SplitLog[1]), C3 = SplitLog[2], UV = SplitLog[3];
x
| summarize arg_max(C2,*) by tostring(UV)
| project-away UV
| project Log = strcat(tostring(C1), "|", cc, "|", C3, "|", UV1)
Output:
Split the LogEntry
column by pipes(|) and create new columns. Then summarize it by UniqueValues(UV) and take the maximum of Timestamp(C2) and remove Unique Value column(which you will get extra column) then concat
it back with Pipe symbol like I have done. Try to follow above example to get desired result as I have got.
EDIT:
ContainerIdList
| extend SplitLog = split(LogEntry, "|")
| project C1 = SplitLog[0], cc=SplitLog[1],C2 = todatetime(SplitLog[1]), C3 = SplitLog[2], UV = SplitLog[3]
| summarize arg_max(C2,*) by tostring(UV)
| project-away UV
| project Log = strcat(tostring(C1), "|", cc, "|", C3, "|", UV1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论