Java高效比较两个列表

huangapple go评论61阅读模式
英文:

Java compare two lists efficiently

问题

我需要比较来自两个不同来源的两个列表的结果

List<MyData> baseList = new ArrayList<>();


List<MyData> externalList = new ArrayList<>();

我需要根据UserACCNUM比较这两个列表中的CFCHash记录如果CDCHash有任何变化我需要在baseList中更新该特定记录

我尝试了以下循环但效率不高

for(MyData ext : externalList) {
  for(MyData base : baseList) {
      if(ext.getCDCHash().equals(base.getCDCHash()) && ext.getAccNum().equals(base.getAccNum())) {
       // 没有变化
     }
     else { 
       // 发现变化 - 需要更新
     }
  }
}

在这种情况下list.stream() 是否有效我有将近 100,000 条记录需要比较

如何高效实现这一点
英文:

I need to compare results of two lists coming from two different sources.

List&lt;MyData&gt; baseList = new ArrayList&lt;&gt;();

Java高效比较两个列表

and

List&lt;MyData&gt; externalList = new ArrayList&lt;&gt;();

Java高效比较两个列表

I need to compare CFCHash records on both the lists w.r.t the UserACCNUM, If there is any changes in the CDCHash I need to update that particular record in baseList.

I tried below looping which didn't sound me efficient

for(MyData ext : externalList) {
  for(MyaData base : baseList) {
      if(ext.getCDCHash().equals(base.getCDCHash()) &amp;&amp; ext.getAccNum().equals(base.getAccNum()) {
       // no change
     }
     else { 
       // changes found - need to update
     }
  }
}

Is list.stream() efficient in this case? I have nearly 100k records to compare.

How do I achieve this efficiently?

答案1

得分: 2

你可以通过为两个列表中的其中一个创建一个快速查找的Map来将你的二次算法转换为线性算法,然后在循环另一个列表时,使用查找来找到另一个列表中与之对应的记录(根据账号号码)。

以下是一个 JavaScript 示例,因为我们不能在这里运行 Java Java高效比较两个列表
请注意,为了示例的完整性,我们假设两个列表的长度相同。

const listA = [{ hash: 'account1v1', account: 1 }, { hash: 'account2v1', account: 2 }];
const listB = [{ hash: 'account1v1', account: 1 }, { hash: 'account2v2', account: 2 }];

const dirtyRecords = findDirtyRecords(listA, listB);

console.log(dirtyRecords);

function findDirtyRecords(listA, listB) {
  const listAMap = new Map();

  for (const record of listA) listAMap.set(record.account, record);
  
  return listB.filter(r => r.hash !== listAMap.get(r.account).hash);
}

希望对你有所帮助!

英文:

You can transform your quadratic algorithm into a linear one by creating a fast lookup Map for one of the two lists and then loop the other list while using the lookup to find the corresponding record in the other list by account number.

JS example just because we can't run Java here Java高效比较两个列表
Note that we assume both lists are of the same length for the sake of the example.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const listA = [{ hash: &#39;account1v1&#39;, account: 1 }, { hash: &#39;account2v1&#39;, account: 2 }];
const listB = [{ hash: &#39;account1v1&#39;, account: 1 }, { hash: &#39;account2v2&#39;, account: 2 }];

const dirtyRecords = findDirtyRecords(listA, listB);

console.log(dirtyRecords);

function findDirtyRecords(listA, listB) {
  const listAMap = new Map();

  for (const record of listA) listAMap.set(record.account, record);
  
  return listB.filter(r =&gt; r.hash !== listAMap.get(r.account).hash);
}

<!-- end snippet -->

答案2

得分: 1

一点点集合论知识可能在这里有所帮助,如果MyData实现了:

  • Comparable
  • equalshashCode

...并且您愿意使用Google Guava

如果您将您所拥有的两个列表设置为Set(如果您真的希望它们有序的话...),那么您只需要调用Sets.difference(baseList, externalList)。然后,您可以遍历所得到的记录集合,以更新您在baseList中需要的值。

不要担心一次完成所有操作。最好更简洁的做法是将其作为两个单独的步骤进行,这样更容易进行调试并确定发生了什么。

英文:

A little bit of set theory may be beneficial here, if MyData implements:

  • Comparable
  • equals and hashCode

...and you're open to using Google Guava.

If you set up the two lists that you have as Sets instead (and they could be ordered if you really wanted them to be...), then all you would have to do is invoke Sets.difference(baseList, externalList). You could then iterate through that resulting collection of records to update the values you need to in baseList.

Don't concern yourself with doing this in one fell swoop. It's better and more succinct to do this as two separate actions so that it's easier to debug and establish what's going on.

答案3

得分: 0

首先,首先,你的问题可能无法解决你的问题。

根据你提供的表格,我看到你的哈希值确实会发生变化,而这些值可能会发生变化。我认为唯一的标识很可能是“用户账号号码”。

根据你的数据来源,逐个/分页遍历这两个源(如果它们是按某个参数排序的,比如账号号码),然后仅比较数据子集可能是有意义的。

假设查询账号1-20(或1-1000),获取最小/最大账号号码,然后在第二个数据源上运行相同的查询以获取“相同的账号”。

然后对这两个集合进行排序和迭代(尝试匹配ID),并在每一行上比较值。

英文:

Well first of all, your question might not solve your problem.

As I see based on the tables you provided, your hash does change, and the values might change. I see that the unique identifier most likely is user acc num.

Depending on the source of your data, it might make sense to iterate / paginate over both of your sources ( if they're ordered by some parameter, e.g. acct num ) and compare just subsets of data.

Let's say, query accounts 1-20 ( or 1-1000 ), get the min/max acct num & then run the same query on the second source of data to get the same accounts.

Then sort & iterate both collections ( try & match the IDs ) and compare values on each line.

huangapple
  • 本文由 发表于 2020年10月8日 03:33:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/64251096.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定