英文:
ADF fuzzy left join returns fewer rows than expected
问题
在ADF中,我创建了一个数据流,其中我使用模糊的左外连接来比较客户姓名并识别重复项。我在源SQL中使用了数据的样本(只有2个客户,两者的姓名都匹配100%)。在测试中,当使用AccountName == AccountName的常规左连接时,结果如预期地返回4条记录,因为两个名称都匹配。
然而,当使用65%匹配阈值的模糊左外连接时,只返回了4条记录中的2条。预期的是每个连接都会返回相同的4条记录。
如果我操作正确,模糊连接需要如何设置才能在这个样本中返回相同数量的记录?我怀疑这是ADF中模糊左外连接的一个错误。
以下是数据:
Acct_Id | AcctName |
---|---|
5799930 | RENTAL SERVICE |
5799940 | RENTAL SERVICE |
数据流:
左外连接结果:
模糊左连接结果:
左外连接条件
模糊左连接条件
英文:
I created a dataflow in ADF in which I am using a fuzzy left outer self join to compare customer names and identify duplicates. I used a sample of the data in the source SQL (2 customers only where both names match 100%). On testing, the regular left join where AccountName == AccountName works as expected and returns 4 records because both names match.
However, when using fuzzy left outer join at 65% match threshold, only 2 of the 4 records are returned. The expectation is that each join would result in the same 4 records.
How does this need to be setup for the fuzzy join to return the same number of records for this sample if I am doing this correctly? I suspect this is a bug with the fuzzy left outer join in ADF.
Here is the data:
Acct_Id | AcctName |
---|---|
5799930 | RENTAL SERVICE |
5799940 | RENTAL SERVICE |
Flow:
LeftOuter join results:
FuzzyLeft join results:
LeftOuter join conditions
FuzzyLeft join conditions
答案1
得分: 0
左连接时,左表中有2条记录,右表中有4条记录。
- 当我进行左连接时,由于右表中有重复记录,所以我得到了4条记录,如下所示:
- 在模糊连接中,它只返回具有相同值的记录,如下所示:
请检查是否有重复记录,或尝试首先将列的数据类型转换为字符串,然后在连接中使用它。
如果仍然遇到相同的问题,可能是匹配算法的问题,那么最好提出一个支持工单以深入调查此问题。
英文:
I also tried the same I have 2 records in left table and 4 records in right table.
- When I left joined it, I got 4 records as I have duplicate records in right table as below:
- In fuzzy joint it is returning the records with common values only as below:
Check if you have any duplicate records or try first convert the datatype of column to string and the use it in join.
If still facing same might be issue with matching algorithm, then it's better to raise a Support ticket for deeper investigation on this issue.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论