英文:
With Task.WhenAll(), tasks are not completed concurrently
问题
我必须进行10次数据库调用,根据不同的参数获取10个条目。我使用Task.WhenAll
来避免按顺序运行它们,但运行时指示它们实际上仍然按顺序运行。也就是说,单个条目需要100毫秒来获取。十个条目需要1000毫秒。等待它们时的时间与按顺序等待它们时的时间相同。我已经简化了代码,但将尝试捕获结构的本质。我还删除了我的业务逻辑和数据结构。
private async Task<IEnumerable<Entry>> GetEntriesAsync(List<long> entryIds)
{
var tasks = new List<Task<Entry>>();
foreach (var entryId in entryIds)
{
var task = this.GetEntryAsync(entryId);
tasks.Add(task);
}
await Task.WhenAll(tasks);
var entries = tasks.Select(x => x.Result).ToList();
return entries;
}
我希望无论entryId
的长度如何,它都会在大致相同的时间内运行。但实际上,它的运行时间会随着条目数量的增加而增加。
接下来的问题是GetEntryAsync
到底做了什么?它实例化了一个DbContext
,然后在该上下文中运行了一个Select
查询IQueryable
。上下文会在每次循环中实例化(即10个条目=10个上下文),这样我就可以并发地进行多个数据库请求。
public virtual async Task<Entry> GetEntryAsync(long entryId)
{
// unitOfWorkFactory.Create()实例化一个带有DbContext的存储库
// 使用DBContextFactory.CreateDBContext()实例化
var entryRepo = this.unitOfWorkFactory.Create().GetWordsRepository(language);
Entry entry = await entryRepo.GetEntryAsync(entryId);
return entry;
}
在entryRepo
类中,我有以下代码:
public async Task<Entry> GetEntryAsync(long entryId)
{
// Entries是一个DBSet<Entry>
IQueryable<Vocabulary> entries = this.dbContext.Entries
.AsNoTracking()
.Where(x => x.Id == entryId);
var entry = await this.GetEntryAsync(entries)
return entry;
}
private async Task<Entry> GetEntryAsync(IQueryable<Entry> entries)
{
return await entries.Select(x => new Entry()
{
Id = x.Id
}).FirstOrDefaultAsync();
}
我需要查询以接近100毫秒的速度运行。我还尝试过使用Select
方法和Parallel.ForEachAsync
,如此处描述的:https://stackoverflow.com/questions/15136542/parallel-foreach-with-asynchronous-lambda。所有方法都产生相同的结果。
我在我的函数中添加了控制台输出,所以我知道所有任务在第一个任务完成之前都已启动。因此,这些任务实际上不是按顺序运行的,但由于某种原因,所花费的时间就像它们是按顺序运行的一样。
编辑:重要的澄清:Where子句不仅仅基于entryId
,而entryId
也不是我所访问的表中的主键Id。可能会有多个具有相同entryId
的行。我还使用了其他几个参数来获取特定的条目(例如language, version, secondaryId1, secId2, secId3, groupNumber, type
等等)。
我不知道如何编写单个查询,因为我需要获取其中每个具有不同的Where参数组合的10个条目。
英文:
I have to make 10 database calls to get 10 entries based on different parameters. I'm using Task.WhenAll
to avoid running them in series, but the runtime indicates they're effectively still in series. Ie. a single entry takes 100ms to pull. Ten entries takes 1000ms. The time is unchanged from awaiting them in series. I've simplified, but will try to capture the essence of the structure. I've also removed my business logic and data structure.
private async Task<IEnumerable<Entry>> GetEntriesAsync(List<long> entryIds)
{
var tasks = new List<Task<Entry>>();
foreach (var entryId in entryIds)
{
var task = this.GetEntryAsync(entryId);
tasks.Add(task);
}
await Task.WhenAll(tasks);
var entries = tasks.Select(x => x.Result).ToList();
return thumbnails;
}
I'd expect this to run in about the same amount of time regardless of the length of entryId
s. Instead it scales in time taken with the number of entries.
Next question is what does GetEntryAsync
actually do? It instantiates a DbContext
, and then runs a Select
against IQueryable
in that context. The contexts are instantiated for each run through the loop (ie 10 entries = 10 contexts), that way I can make the multiple database requests concurrently.
public virtual async Task<Entry> GetEntryAsync(long entryId)
{
// unitOfWorkFactory.Create() instantiates a repository with a dbcontext
// instantiated using DBContextFactory.CreateDBContext()
var entryRepo = this.unitOfWorkFactory.Create().GetWordsRepository(language);
Entry entry = await entryRepo.GetEntryAsync(entryId);
return Entry;
}
Within the entryRepo class I have:
public async Task<Entry> GetEntryAsync(long entryId)
{
// Entries is a DBSet<Entry>
IQueryable<Vocabulary> entries = this.dbContext.Entries
.AsNoTracking()
.Where(x => x.Id == entryId);
var entry = await this.GetEntryAsync(entries)
return entry;
}
private async Task<Entry> GetEntryAsync(IQueryable<Entry> entries)
{
return await entries.Select(x => new Entry()
{
Id = x.Id
}).FirstOrDefaultAsync();
}
I need the query to run closer to 100ms rate. I've also tried using the Select
method and Parallel.ForEachAsync
as described here: https://stackoverflow.com/questions/15136542/parallel-foreach-with-asynchronous-lambda. All have the same result.
I've put console writes in my functions, so I know the tasks are all starting before the first finishes. So the tasks are not actually in series, but for some reason the time taken is as if they are in series.
Edit: important clarification: The where clause is not based on just an entryId
, and entryId
is not the key Id in the table I'm hitting. There can be multiple rows with that entryId
. I'm also using several more parameters to get the specific entry (language, version, secondaryId1, secId2, secId3, groupNumber, type
, etc).
I'm not sure how to write a single query because, I need say 10 entries where each have a different combination of the where parameters.
答案1
得分: 2
在这种情况下,并行化可能不会提高性能。就像@Charlieface所说的那样。数据库不喜欢这样做。然而,查询所有实体的一种查询可以帮助很多,如果要查询的entryIds数量很小(比如:小于100),使用List<T>.Contains()可以帮助很多。
private async Task<List<Entry>> GetEntriesAsync(List<long> entryIds)
{
return await this.dbContext.Entries.AsNoTracking()
.Where(x => entryIds.Contains(x.Id))
.Select(x => new Entry()
{
Id = x.Id
}).ToListAsync();
}
LINQ以优化的方式将这些内容发送到SQL Server,因为它了解List<T>.Contains()的作用。这并不适用于所有方法,但Contains()是特殊的。
要注意以'To'开头的IQueryable方法,比如ToList()或ToDictionary()。它们会导致查询被执行并将结果存储在内存中。因此,在它们之后调用First()或FirstOrDefaultAsync()对性能没有任何帮助。
这就是为什么我选择使用ToListAsync()的异步版本,因为查询操作发生在那里。
英文:
Parallelization might not increase performance in this case. Like @Charlieface said. Databases don't like that. Querying all entities in one query however helps a lot. If the amount of entryIds you want to query is small. (say: less than 100) using List<T>.Contains() can help a lot.
private async Task<List<Entry>> GetEntriesAsync(List<long> entryIds)
{
return await this.dbContext.Entries.AsNoTracking()
.Where(x => entryIds.Contains(x.Id))
.Select(x => new Entry()
{
Id = x.Id
}).ToListAsync();
}
Linq sends this to SQL Server in an optimized way, because it understands what List<T>.Contains() does. This doesn't work for all methods, but Contains() is special.
Look out for IQueryable methods that starts with 'To' Like ToList() or ToDictionary(). They cause the query to be executed and pull the result in memory. So Calling First() or FirstOrDefaultAsync() after them does nothing for performance.
That's why I choose to use the async version of ToList(), since that is where the querying happens.
答案2
得分: 1
移除GetEntryAsync
中的ToList
:
private async Task<Entry> GetEntryAsync(
IQueryable<Entry> entries)
{
return await entries.Select(x => new Entry()
{
Id = x.Id
}).FirstOrDefaultAsync();
}
不仅使方法同步,而且将所有内容都提取到内存中,而您只需要一个条目。
但通常情况下,您应该编写查询以返回所有所需的条目。
英文:
Remove ToList
in GetEntryAsync
:
private async Task<Entry> GetEntryAsync(
IQueryable<Entry> entries)
{
return await entries.Select(x => new Entry()
{
Id = x.Id
}).FirstOrDefaultAsync();
}
Not only you are making method syncronious but you also fetch everything into memory while you need only 1 entry.
But in general you should just write query so it returns all needed entries.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论