英文:
.ToList() is not saving virtual Icollection to memory, which is causing multiple calls to the db
问题
我目前正在进行的项目需要进行延迟加载。我正在尝试创建一个方法,该方法接收一个已加载的列表属性作为参数。然而,似乎传递的列表项是代理而不是内存中的项。由于我无法在此方法内调用数据库,我需要这个列表在内存中。
看着输出,第一个 test
变量通过 EF Core 生成了一次数据库调用。然而,当对 test
使用 .Any()
时,会创建更多的 EF 查询!
我假设 ToList();
创建了一个单一的查询,然后将结果保存在内存中的 test
变量下。为什么 EF 会继续查询它已经获取的信息呢?
英文:
The project I am currently working on requires lazy loading to be done. I am attempting to create a method that gets passed in a loaded list property. Instead, it seems that the list items are passed as a proxy instead of in-memory items. I need this list to be in-memory as I am unable to call the database from within this method.
var test = loan.LoanStatuses.ToList();
if (test.Any(ls => ls.Status.Code == CodeTypes.LoanStatusType.Settled && ls.Active))
messages.Add("Loan has already been settled.");
looking at the output the first test
variable generates one call to the DB through EF Core. However when test
is acted on with .Any()
more ef queries are created!
I am under the assumption toList();
creates one single query and then saves the result in memory under that test variable. why would EF continue to query on information its already got.
答案1
得分: 1
懒加载应该被视为一种保险措施,而不是依赖于加载数据的方式。你可能会遇到问题的部分是这个语句:
ls.Status.Code == CodeTypes.LoanStatusType.Settled
这是由于懒加载的工作方式决定的。比如我们加载了一个贷款:
var loan = context.Loans.Single(x => x.LoanId == loanId);
这会生成一个SQL语句来加载贷款,没有其他内容被急加载。稍后,你执行以下操作:
var test = loan.LoanStatuses.ToList();
由于我们首次访问贷款上的LoanStatuses集合,这将生成第二个SQL语句,实际上执行类似以下操作:
SELECT * FROM LoanStatuses WHERE LoanId = @loanId;
这本身没问题,只是多了一个查询来获取LoanStatuses。然而,假设加载了10个LoanStatus记录,ID为1-10,都关联到我们单个Loan ID为1的贷款。现在我们想要访问每个LoanStatus上的相关Status实体。这将导致10次懒加载调用,为每个我们访问的LoanStatus调用了Any
方法:
SELECT * FROM Statuses WHERE StatusID = @loanStatus(1).StatusId;
SELECT * FROM Statuses WHERE StatusID = @loanStatus(2).StatusId;
SELECT * FROM Statuses WHERE StatusID = @loanStatus(3).StatusId;
...
SELECT * FROM Statuses WHERE StatusID = @loanStatus(10).StatusId;
如果你打算基于实体进行业务逻辑处理,那么在调用依赖于这些相关实体的代码之前,代码中需要有提前加载所有必需信息的预见。
例如,如果你要加载一个贷款并需要根据其关联的状态来确定逻辑:
var loan = context.Loans
.Include(x => x.LoanStatuses)
.ThenInclude(x => x.Status)
.Single(x => x.LoanId == loanId);
从这里将贷款传递到方法中将确保所有适用的状态都已经急加载,不应该需要懒加载。懒加载是一个安全网,可以确保任何可能没有加载的内容都可以被加载,而不会导致NullReferenceException
,但你应该定期对数据库进行性能分析,以查看是否存在懒加载的迹象,并努力消除任何懒加载调用。 "只需记住急加载" 方法的问题在于,随着你决定需要添加更多关系,这会产生一个笛卡尔积,逐渐开发人员开始"包含"所有东西,性能开始恶化回到懒加载一切时的水平。
比起传递实体,更好的方法是在从DbContext读取数据时,将业务逻辑需要的信息投影出来,以便EF可以构建有效的查询,确保从数据库中检索所有需要的数据,只检索需要的数据。例如,如果我们需要从贷款中获取一些值,以及指示贷款是否结清的标志:
var loanDetails = context.Loans
.Where(x => x.LoanId == loanId)
.Select(x => new LoanDetail
{
LoanId = x.LoanId,
// ... 其他细节。
IsSettled = x.LoanStatuses.Any(s => s.IsActive
&& s.Status.Code == CodeTypes.LoanStatusType.Settled)
})
.Single();
这只加载我们需要的信息,可以安全地传递给方法,因为从DTO类的定义中清楚地表明了加载了哪些信息。不存在懒加载或DbContext可能被释放无法懒加载信息的风险。理想情况下,实体不应该在加载它们的DbContext的范围之外传递,因为这会导致诸多问题,如旧数据、懒加载成本以及潜在的多个跟踪引用问题。这会带来一个成本,需要为业务逻辑需要的字段定义一个类结构,但这是一种权衡。只有在服务的业务逻辑发生变化时,这个结构才需要更改。 "只需传递实体,因为它包含了一切" 可能很诱人,但确保"一切" 可以访问会带来显著的成本。
英文:
Lazy loading should be treated as a failsafe rather than relied on for loading data like this. The issue you will be running into will be this part of the statement:
ls.Status.Code == CodeTypes.LoanStatusType.Settled
This is due to the nature of how lazy loading works. say we loaded a loan like:
var loan = context.Loans.Single(x => x.LoanId == loanId);
this builds an SQL Statement to load the loan, nothing else is eager loaded. Later you do the following:
var test = loan.LoanStatuses.ToList();
Since we access the LoanStatuses collection on the loan for the first time, this will generate a second SQL statement effectively doing something like:
SELECT * FROM LoanStatuses WHERE LoanId = @loanId;
And that in itself is fine, it is just one extra query to get the LoanStatuses. However, lets say this loaded 10 LoanStatus records, IDs 1-10 all associated to our single Loan ID of 1. We now want to access the related Status entity on each LoanStatus. This will result in 10 lazy load calls to get the status for each LoanStatus we touch due to the `Any' call:
SELECT * FROM Statuses WHERE StatusID = @loanStatus(1).StatusId;
SELECT * FROM Statuses WHERE StatusID = @loanStatus(2).StatusId;
SELECT * FROM Statuses WHERE StatusID = @loanStatus(3).StatusId;
...
SELECT * FROM Statuses WHERE StatusID = @loanStatus(10).StatusId;
If you're going to have business logic based on entities then you need the foresight in the code to ensure that all required information is eager loaded before calling code that is dependent on those related entities.
For example, if you are going to load a loan and will need to determine logic based on it's associated statuses then:
var loan = context.Loans
.Include(x => x.LoanStatuses)
.ThenInclude(x => x.Status)
.Single(x => x.LoanId == loanId);
From here passing the loan into the method will ensure that the applicable statuses are all eager loaded and nothing should need to lazy load. Lazy Loading is a safety net to ensure anything that might not have been loaded can be loaded rather than result in a NullReferenceException
, but you should be profiling the database regularly for evidence of lazy loading getting tripped and working to remove any lazy loading calls. The problem with the approach of "just remember to eager load" is that this produces a Cartesian Product with the more relationships you decide you need to add to the eager loading. Gradually developers just start "including" everything all of the time and watch as performance starts to degrade back towards what it was when lazy loading everything.
A better approach than passing entities around is to project the information your business logic will need at the time of reading from the DbContext so that EF can build an efficient query to ensure that all data needed, and only the data that is needed is retrieved from the database. For instance, if we need a few values from the loan, and a flag to indicate whether the loan is settled or not:
var loanDetails = context.Loans
.Where(x => x.LoanId == loanId)
.Select(x => new LoanDetail
{
LoanId = x.LoanId,
// ... other details.
IsSettled = x.LoanStatuses.Any(s => s.IsActive
&& s.Status.Code == CodeTypes.LoanStatusType.Settled)
)).Single();
This loads just the information we need and this can safely be passed around to methods because it is clear from the definition of this DTO class what information is loaded. There is no risk of tripping lazy loading or situations where the DbContext might be disposed so information cannot be lazy loaded. Ideally entities should never be passed around beyond the scope of the DbContext that loaded them as this leads to all kinds of problems with stale data, lazy loading costs and issues with potentially multiple tracked references. This does come at a cost of needing to define a class structure for the fields the business logic is going to need, but this is the trade off. This structure should only need to change if the business logic it serves changes. It may be tempting to "just pass the entity since it has everything" but ensuring that "everything" is accessible attracts a significant cost.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论