英文:
Memory efficient way to traverse all rows of a table with Entity Framework (7)
问题
以下是翻译好的内容:
我正在使用Entity Framework 7,我需要定期将多个包含大量数据的表合并到一个共同的表中。
在Entity Framework中,有没有一种内存高效且简便的方式以批处理的方式遍历每个表?
目前我在计算行数,然后计算跳过/分页的数量,然后手动遍历。
英文:
I am using Entity Framework 7 and I have to merge multiple tables with many data into a common table, regularly.
Is there a memory efficient and easy way to traverse each table in batches with Entity Framework?
Currently I'm counting the rows and then calculate the number of skips/pages and traverse by hand.
答案1
得分: 1
- 使用一个有界的 DbContext 设置,其中只包含与所需合并的实体配置相关的内容,每个实体只声明足够的字段以满足您需要合并的细节(如果您不需要所有字段)。
- 确保您正在使用一个“新鲜”的范围化 DbContext。这意味着,如果您正在使用依赖注入,并且所讨论的上下文已经加载/添加了其他跟踪的实体引用,请考虑为此操作创建一个新的 DbContext 实例,而不是使用已注入的实例。为了保持依赖注入模式的一致性,可以注入诸如 DbContextScopeFactory 这样的东西,用于像这样需要一个隔离的、干净的 DbContext 的情况。
- 构建针对两个实体集的查询,并在一个
while
循环中使用Skip
和Take
从每个表中提取一组合理的行,进行处理,然后插入。 - 如果源行本身不需要更新,请在读取这些源实体时使用
AsNoTracking
。这将使数据库读取操作保持快速,因为当 DbContext 开始跟踪更多实体时,它将花费更多时间在要求更多数据和保存新数据时检查现有引用。 - 在将新实体添加到合并结果并保存后,请将其与 DbContext 分离,以保持进一步的 DbContext 操作快速。
context.Entry(newMergedEntity).EntityState = EntityState.Detached;
如果需要更新源实体,或者希望在 SaveChanges
调用之间以较小的批次插入实体,则可以使用 context.ChangeTracker.Clear()
来清除所有当前的跟踪。
英文:
If it can be done in SQL, something like that should be done in SQL. If it pretty much requires to be done in-application then I can suggest the following considerations:
- Use a bounded DbContext set up with just the Entity configurations involved with only enough fields declared from each to satisfy the details you need to merge. (If you don't need everything)
- Ensure that you are using a "fresh" scoped DbContext. Meaning that if you are using DI and the context in question has loaded/added other tracked entity references, consider scoping a new DbContext instance for this operation rather than using the injected one. To keep DI patterns happy, inject something like a DbContextScopeFactory for use for cases like this where you need an isolated, clean DbContext.
- Build your query for the two entity sets and use
Skip
andTake
within awhile
loop to extract a reasonable set of rows from each table, do your processing, and insert. - If the source rows do not need to be updated themselves, use
AsNoTracking
when reading those source entities. This will keep DB read operations fast as when the DbContext starts tracking more entities, that is more time it will spend checking existing references when asked for more data and saving new data. - After adding the new entity into the merged results and saving, detach it from the DbContext to keep further DbContext operations quick. `context.Entry(newMergedEntity).EntityState = EntityState.Detached;
If you need to update the source entities, or want to insert entities in smaller batches between SaveChanges
call, then instead of detaching individual entities via EntityState
, use context.ChangeTracker.Clear()
to remove all current tracking.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论