英文:
Azure data explorer deal with duplicates and deleted records
问题
我正试图将数据从我的Cosmos DB for MongoDB集合导入到Azure数据资源管理器表中。我已经能够使用Azure数据工厂完成此操作,其中我可以创建一个定期获取所有记录的管道。
问题是,当它获取这些记录时,它将它们插入到数据资源管理器表中,而不管它们是否已经存在。而且,如果从我的Mongo DB集合中删除了一条记录,这条记录将永远不会从我的Azure数据资源管理器表中删除。
我尝试创建一个数据流,以数据资源管理器为接收端(它有一个在复制这些记录之前重新创建表的选项,这将解决问题,因为表将在复制记录之前被删除并重新创建),但是Cosmos DB for MongoDB不支持作为源。
有什么办法可以在Azure数据资源管理器中维护一个从Mongo DB集合每隔一段时间获取数据的表?(而不会出现重复记录)
英文:
I am trying to ingest data into my azure data explorer table from my cosmos db for mongo db collection. I was able to do this using azure data factory where I can create a pipeline that fetches all records every x hours.
The problem is when it fetches these records it will insert them in data explorer table, regardless if they already exist or not. Also if a record was deleted from my mongo db collection, this record will never be deleted from my azure data explorer table.
I tried to create a data flow using data explorer as sink (It has an option to recreate the table again before copying these records which would solve the problem, because the table would be deleted and created again before copying the records) but cosmos db for mongo db is not supported as a source.
Any idea How I can maintain a table in azure data explorer that fetches data from a mongo db collection every x hours? (without having duplicate records)
答案1
得分: 1
由于您的要求是复制数据而不产生重复记录,您可以使用“Azure数据资源管理器命令”活动来清除Kusto表中的数据,然后使用复制活动将数据从Azure Cosmos DB(Mongo DB)复制到Kusto数据库。
- 使用“Azure数据资源管理器命令”活动,并输入以下命令:
.clear table <table-name> data;
将<table-name>
替换为实际的目标表名称。
- 然后使用复制活动,将Mongo DB集合作为源数据集,将Kusto数据库表作为目标数据集。
通过这种方式,数据可以在不产生重复的情况下复制。
英文:
Since your requirement is to copy the data without having duplicate records, you can use Azure Data explorer command
activity to clear the data in the Kusto table and then use the copy activity to copy the data from Azure cosmos db (mongo db) to Kusto database.
- Take the
Azure Data Explorer command
activity and give the command as,
.clear table <table-name> data;
Replace the <table-name>
with the actual sink table name.
- Then take the copy activity with mongo db collection as source dataset and Kusto database table as a sink dataset.
By this way, data can be copied without any duplicates.
答案2
得分: 0
这里描述了处理重复数据的几种策略:在 Kusto 中处理重复数据。
英文:
there are several tactics for dealing with duplicate data described here: Handle duplicate data in Kusto
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论