2020年10月8日 03:33:05go评论73阅读模式

英文:

Java compare two lists efficiently

问题

我需要比较来自两个不同来源的两个列表的结果。

List<MyData> baseList = new ArrayList<>();

和

List<MyData> externalList = new ArrayList<>();

我需要根据UserACCNUM比较这两个列表中的CFCHash记录，如果CDCHash有任何变化，我需要在baseList中更新该特定记录。

我尝试了以下循环，但效率不高

for(MyData ext : externalList) {
  for(MyData base : baseList) {
      if(ext.getCDCHash().equals(base.getCDCHash()) && ext.getAccNum().equals(base.getAccNum())) {
       // 没有变化
     }
     else { 
       // 发现变化 - 需要更新
     }
  }
}

在这种情况下，list.stream() 是否有效？我有将近 100,000 条记录需要比较。

如何高效实现这一点？

英文:

I need to compare results of two lists coming from two different sources.

List&lt;MyData&gt; baseList = new ArrayList&lt;&gt;();

and

List&lt;MyData&gt; externalList = new ArrayList&lt;&gt;();

I need to compare CFCHash records on both the lists w.r.t the UserACCNUM, If there is any changes in the CDCHash I need to update that particular record in baseList.

I tried below looping which didn't sound me efficient

for(MyData ext : externalList) {
  for(MyaData base : baseList) {
      if(ext.getCDCHash().equals(base.getCDCHash()) &amp;&amp; ext.getAccNum().equals(base.getAccNum()) {
       // no change
     }
     else { 
       // changes found - need to update
     }
  }
}

Is list.stream() efficient in this case? I have nearly 100k records to compare.

How do I achieve this efficiently?

答案1

得分: 2

你可以通过为两个列表中的其中一个创建一个快速查找的Map来将你的二次算法转换为线性算法，然后在循环另一个列表时，使用查找来找到另一个列表中与之对应的记录（根据账号号码）。

以下是一个 JavaScript 示例，因为我们不能在这里运行 Java
请注意，为了示例的完整性，我们假设两个列表的长度相同。

const listA = [{ hash: 'account1v1', account: 1 }, { hash: 'account2v1', account: 2 }];
const listB = [{ hash: 'account1v1', account: 1 }, { hash: 'account2v2', account: 2 }];

const dirtyRecords = findDirtyRecords(listA, listB);

console.log(dirtyRecords);

function findDirtyRecords(listA, listB) {
  const listAMap = new Map();

  for (const record of listA) listAMap.set(record.account, record);
  
  return listB.filter(r => r.hash !== listAMap.get(r.account).hash);
}

希望对你有所帮助！

英文:

You can transform your quadratic algorithm into a linear one by creating a fast lookup Map for one of the two lists and then loop the other list while using the lookup to find the corresponding record in the other list by account number.

JS example just because we can't run Java here
Note that we assume both lists are of the same length for the sake of the example.

const listA = [{ hash: &#39;account1v1&#39;, account: 1 }, { hash: &#39;account2v1&#39;, account: 2 }];
const listB = [{ hash: &#39;account1v1&#39;, account: 1 }, { hash: &#39;account2v2&#39;, account: 2 }];

const dirtyRecords = findDirtyRecords(listA, listB);

console.log(dirtyRecords);

function findDirtyRecords(listA, listB) {
  const listAMap = new Map();

  for (const record of listA) listAMap.set(record.account, record);
  
  return listB.filter(r =&gt; r.hash !== listAMap.get(r.account).hash);
}

答案2

得分: 1

一点点集合论知识可能在这里有所帮助，如果MyData实现了：

Comparable
equals 和 hashCode

...并且您愿意使用Google Guava。

如果您将您所拥有的两个列表设置为Set（如果您真的希望它们有序的话...），那么您只需要调用Sets.difference(baseList, externalList)。然后，您可以遍历所得到的记录集合，以更新您在baseList中需要的值。

不要担心一次完成所有操作。最好更简洁的做法是将其作为两个单独的步骤进行，这样更容易进行调试并确定发生了什么。

英文:

A little bit of set theory may be beneficial here, if MyData implements:

Comparable
equals and hashCode

...and you're open to using Google Guava.

If you set up the two lists that you have as Sets instead (and they could be ordered if you really wanted them to be...), then all you would have to do is invoke Sets.difference(baseList, externalList). You could then iterate through that resulting collection of records to update the values you need to in baseList.

Don't concern yourself with doing this in one fell swoop. It's better and more succinct to do this as two separate actions so that it's easier to debug and establish what's going on.

答案3

得分: 0

首先，首先，你的问题可能无法解决你的问题。

根据你提供的表格，我看到你的哈希值确实会发生变化，而这些值可能会发生变化。我认为唯一的标识很可能是“用户账号号码”。

根据你的数据来源，逐个/分页遍历这两个源（如果它们是按某个参数排序的，比如账号号码），然后仅比较数据子集可能是有意义的。

假设查询账号1-20（或1-1000），获取最小/最大账号号码，然后在第二个数据源上运行相同的查询以获取“相同的账号”。

然后对这两个集合进行排序和迭代（尝试匹配ID），并在每一行上比较值。

英文:

Well first of all, your question might not solve your problem.

As I see based on the tables you provided, your hash does change, and the values might change. I see that the unique identifier most likely is user acc num.

Depending on the source of your data, it might make sense to iterate / paginate over both of your sources ( if they're ordered by some parameter, e.g. acct num ) and compare just subsets of data.

Let's say, query accounts 1-20 ( or 1-1000 ), get the min/max acct num & then run the same query on the second source of data to get the same accounts.

Then sort & iterate both collections ( try & match the IDs ) and compare values on each line.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Java高效比较两个列表

问题

答案1

答案2

答案3

如何覆盖默认的hazelcast.xml配置？

`String.isBlank()` 替代方法

CPU处理器与线程数量

Pattern matching in gremlin 在Gremlin中的模式匹配

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论