英文:
Kusto query to cluster rows in a table and link clusters with original rows
问题
我有一个简单的问题,但是一直没有找到解决方案。我正在使用Azure数据资源管理器,并且需要一个Kusto查询,将相似的行分组在一起,并输出与原始行一起的聚类。目前,我使用的聚类函数包括autocluster
和basket
,只提供与聚类相关的行数,如下所示。
例如,我想要找到原始的200行在段0中,91行在段1和2中,而不是每个段中的计数。
我会非常感谢如果有人可以帮助我对表中的行进行聚类,并将聚类与原始行链接起来。
干杯!
英文:
I have a simple question, but have not been able to find a solution yet.
I am using Azure data explorer, and I am in need of a kusto query that group similar rows together, and output clusters along with the original rows. Currently, the clustering functions that I have used including autocluster
and basket
only provide with row count related to the cluster as below.
For example, what I am after is finding the original 200 rows in segment 0, 91 rows in segment 1 and 2, rather than count in each segment.
I would massively appreciate it if someone can help me out to cluster rows in a table and link clusters with the original rows.
Cheers!
答案1
得分: 1
这是不可能做到的。几年前,我们考虑过实现它,但与原始表进行的这种交叉连接非常昂贵,如果要与原始大表连接许多段,可能会导致性能问题。如果您使用某个客户端的 SDK 运行查询,可以循环遍历各个段并逐个获取每个段的记录集。
无论如何,请查看 log_reduce_full_fl() 和其他 log_reduce 函数,它们对将可变文本日志聚合到单列中非常有用。
英文:
This can't be done. IFew years ago we considered implementing it but this cross join with the original table is very expensive and can explode in case you have many segments to join with the original big table. If you run the query by using SDK from some client you can loop over the segments and fetch the record sets of each segment one by one.
Regardless, have a look at log_reduce_full_fl() and the other log_reduce functions, they are useful to cluster variable text logs in a single column.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论