在lambda的Where()子句中使用Select()是否会有性能损耗?

huangapple go评论75阅读模式
英文:

Is there a performance penalty when using a Select() inside a lambda Where() clause?

问题

我在尝试评估需要创建哪些事件时偶然发现了一些东西。

我有这样的代码:

var eventsToBeCreated = 
            requiredEventDates.Where(d => !events.Select(e => e.eventDay).Contains(d));

但是这让我想到了,从性能的角度来看,这可能不是一个很好的主意,因为我认为(我不确定)Select() 会为每个元素单独进行评估,所以我将其更改为:

var existingEventDays =
            events.Select(e => e.eventDay);
        
var eventsToBeCreated = 
            requiredEventDates.Where(d => !existingEventDays.Contains(d));

但是我也不确定这个。因为 existingEventdays 是一个 IEnumerable<DateTime>,我猜这仍然会导致可枚举对象被多次解析?所以我将其更改为:

var existingEventDays =
            events.Select(e => e.eventDay).ToList();
        
var eventsToBeCreated = 
            requiredEventDates.Where(d => !existingEventDays.Contains(d));

...以确保 existingEventDays 只计算一次。

我的假设是否正确,或者第一版和第三版的性能相同?

英文:

Stumbled upon something while trying to evaluate which events I need to create.

I had a code like this:

var eventsToBeCreated = 
            requiredEventDates.Where(d =&gt; !events.Select(e =&gt; e.eventDay).Contains(d));

But it made me wonder if this is not such a good idea performance wise, because I believe (I am not sure) the Select() gets evaluated individually for every element, so I changed it to:

var existingEventDays =
            events.Select(e =&gt; e.eventDay);
        
var eventsToBeCreated = 
            requiredEventDates.Where(d =&gt; !existingEventDays.Contains(d));

But I was not sure about this either. As existingEventdays is an IEnumerable&lt;DateTime&gt; I guess this would still lead to the enumerable to be resolved multiple times? So I changed it to:

var existingEventDays =
            events.Select(e =&gt; e.eventDay).ToList();
        
var eventsToBeCreated = 
            requiredEventDates.Where(d =&gt; !existingEventDays.Contains(d));

..to make sure that the existingEventDays get calculated only one time.

Are my assumptions correct or is this not necessary and the first version would offer the same performance as the third?

答案1

得分: 3

I'll assume you actually consume the whole query created with Where, like calling ToList(). If you don't consume it, then nothing in the Where lambda is executed. You're just creating a bunch of IEnumerable<T>s. See Deferred Execution.

Regarding the second snippet, you extracted the Select call to a variable, this indeed causes events.Select to only be called once, instead of once for every element in requiredEventDates. But again, due to Deferred Execution, calling Select itself is not very expensive. It is the looping that Contains does that is usually expensive.

Regarding the third snippet, you first made a list out of the dates from the events. This loops through the entirety of events. And Contain loops through the list for each element in requiredEventDates, on top of that. So you essentially looped through the whole list one more time than necessary.

To avoid all this looping, you can instead put the dates into a HashSet:

var existingEventDays =
            events.Select(e => e.eventDay).ToHashSet();
        
var eventsToBeCreated = 
            requiredEventDates.Where(d => !existingEventDays.Contains(d));

Now you only loop through events once, to create the set. And Contains looks up d in the set, which can be a lot faster than looking things up in a list.

英文:

I'll assume you actually consume the whole query created with Where, like calling ToList(). If you don't consume it, then nothing in the Where lambda is executed. You're just creating a bunch of IEnumerable&lt;T&gt;s. See Deferred Execution.

Regarding the second snippet, you extracted the Select call to a variable, this indeed causes events.Select to only be called once, instead of once for every element in requiredEventDates. But again, due to Deferred Execution, calling Select itself is not very expensive. It is the looping that Contains does that is usually expensive.

Regarding the third snippet, you first made a list out of the dates from the events. This loops through the entirety of events. And Contain loops through the list for each element in requiredEventDates, on top of that. So you essentially looped through the whole list one more time than necessary.

To avoid all this looping, you can instead put the dates into a HashSet:

var existingEventDays =
            events.Select(e =&gt; e.eventDay).ToHashSet();
        
var eventsToBeCreated = 
            requiredEventDates.Where(d =&gt; !existingEventDays.Contains(d));

Now you only loop through events once, to create the set. And Contains looks up d in the set, which can be a lot faster than looking things up in a list.

huangapple
  • 本文由 发表于 2023年6月8日 19:25:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76431363.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定