长时间运行的并行任务使用Entity Framework导致CPU峰值和内存使用率高。

huangapple go评论85阅读模式
英文:

Long-running parallel Tasks with Entity Framework cause high CPU peak and memory usage

问题

我正在将一个C# ASP.NET Core 7项目从使用SqlClient和常规SQL查询转换为使用Entity Framework。我有一个特定的地方,当应用程序运行多个长时间运行的任务时,它有一个大型的for循环仿真,用户可以跟踪进度,因此每个任务在其自己的任务中多次写入数据库。旧的SqlClient解决方案可以平稳运行,CPU和内存使用很少,但是使用EF时,一旦线程开始工作,一切都停滞不前,冻结了。

我知道DbContext不是线程安全的,因此每个任务都创建自己的DbContext,它们在数据库插入发生的地方创建,一旦不再需要它们,就立即释放它们,但是在for循环中,它完全冻结了计算机,一切都停止了。Web应用程序甚至不再响应。

精简的控制器:

public SmContext db { get; set; }

public SimulateRoundModel(SmContext db)
{
    this.db = db;
}

public async Task<IActionResult> OnPost()
{
    List<Match> matches = new CollectorClass(db).Collect();
    MyClass.Wrapper(matches);
    return Page();
}

精简的代码:

public static void Wrapper(List<Match> matches)
{
    Parallel.For(0, matches.Count,
           index =>
           {
               matches[index].LongSim();
           });
}

Match类:

private SmContext db { get; set; }

public Match(SmContext db)
{
    this.db = db;
}

public void LongSim()
{
    db.Dispose(); // 处理构造函数接收的主要DbContext,我们不想使用它

    using (SmContext db = new SmContext())
    {
        // 一些初始查询和插入
    }

    for (int i = 0; i < 100; i++)
    {
        Thread.Sleep(5000);

        // 一些模拟

        db = new SmContext();

        SomeInsert(); // 这些使用db进行插入
        SomeInsert();
        SomeInsert();

        db.Dispose();
    }
}

我们谈论的是5-50个比赛,Parallel.For在旧的SqlClient解决方案中优化得非常好,我曾经看到它在200个比赛的情况下没有问题。这些不是密集型任务,只是一些简单的东西和一些查询,但它们运行时间很长。理想情况下,我希望能够继续将进度保存到数据库而无需进行重大重写。

最重要的问题是,这里是否存在概念性问题,我是否太新手以至于无法识别,或者这个解决方案应该正常工作,代码中有一些不清楚的地方?

英文:

I am shifting a C# ASP.NET Core 7 project from using SqlClient with regular SQL queries to using Entity Framework instead. I have a particular place when the application runs multiple long-running tasks, it's kind of a simulation with a big for loop where the user can follow progress, and for that reason, each task writes into the database dozens of times in its own task. The old SqlClient solution worked smoothly with minimal CPU and memory usage, but with EF, once the threads are beginning to work, everything halts and freezes.

I know that DbContext is not thread-safe, therefore each task creates its own DbContext, and they create it, particularly where the database inserts occur, and I dispose them right away once they are not needed, and yet, in the for loop it completely freezes the computer and everything stops. The web application is not even responding anymore.

The simplified controller:

    public SmContext db { get; set; }

    public SimulateRoundModel(SmContext db)
    {
        this.db = db;
    }

    public async Task&lt;IActionResult&gt; OnPost()
    {
        List&lt;Match&gt; matches = new CollectorClass(db).Collect();
        MyClass.Wrapper(matches);
        return Page();
    }

The simplified code:

public static void Wrapper(List&lt;Match&gt; matches)
{
    Parallel.For(0, matches.Count,
           index =&gt;
           {
               matches[index].LongSim();
           });
}

Match class:


private SmContext db { get; set; }

public Match(db)
{
    this.db = db;
}

public void longSim()
{
    db.Dispose(); // disposing the main dbcontext that the constructor receives, we don&#39;t want to use that

    using (SmContext db = new SmContext())
    {
        // some initial query and insert
    }

    for (int i = 0; i &lt; 100; i++)
    {
        Thread.Sleep(5000);

        // some simulation

        db = new SmContext();

        SomeInsert(); // these are using the db for the insert
        SomeInsert();
        SomeInsert();

        db.Dispose();
    }
}

We are talking about 5-50 matches and Parallel.For optimized them very well with the old SqlClient solutions, I have seen running it with 200 matches without an issue before. These are not intensive tasks, only simple stuff, and some queries, but they are running long. Ideally, I would like to continue saving the progress to the database without a major rewrite.

The ultimate question is, is there a conceptual issue here, that I am too newbie to recognize, or this solution should work fine and there is something fuzzy going on in the black spots of the code?

答案1

得分: 1

这段文本的中文翻译如下:


这更像是猜测,而不是我可以证明的事情,但根据我的经验,多次使用相同上下文的SomeInsert看起来有点可疑。EF Core执行插入/更新操作依赖于跟踪,即使你使用AsNoTracking,新条目仍然会由更改跟踪器处理,因此如果你实际上要插入大量数据(注意,EF一直不太适用于批量插入),你的更改跟踪器会拥有大量实体,这可能会严重降低EF的性能。我建议以下几种选项之一:

  • 在插入一些相当数量的实体后调用ChangeTracker.Clear(这也可以代替在循环之外重新创建上下文)

  • 在插入一些相当数量的实体后重新创建上下文

  • 使用另一种支持批量插入的技术或扩展库(例如EFCore.BulkExtensions)

  • 你需要确定重新创建/清除跟踪器并调用SaveChanges的插入数据的最佳大小,就像在这个答案中为旧版本的EF所做的那样。

附言:

Parallel.For
public void longSim()
Thread.Sleep(5000);

我强烈建议将longSim异步化,使用await Task.Delay(5000),并切换到支持异步方法的Parallel.ForEachAsync。这还将允许使用EF Core方法的异步版本。

还有一件值得考虑的事情是线程池饥饿,有时可能会产生类似的“副作用”,但如果你所做的唯一更改是切换到EF Core而不是SQLClient,并且导致观察到的行为,那么线程池饥饿不应该是原因。

英文:

It would more in guess territory then something I can prove but from my experience multiple SomeInsert's with the same context look a bit suspicious. EF Core performs insert/update operation relying on tracking and even if you use AsNoTracking new entries still will be handled by change tracker, so if you are actually inserting a lot of data (and note that EF always was not very suitable for batch inserts) you will end up with the change tracker having a lot of entities which can slow down EF performance considerably. I would suggest one of the following options:

  • Call ChangeTracker.Clear after inserting some considerable amount of entities<sup>*</sup> (this also can be used instead of recreating the context outside the loop)
  • Recreate the context after inserting some considerable amount of entities<sup>*</sup>
  • Use another technology or extension library (EFCore.BulkExtensions for example) supporting bulk inserts

<sup>*</sup> - you will need to determine the optimal size of inserted data to recreate/clear tracker and call SaveChanges, like was done for old iteration of EF in this answer.

P.S.

> Parallel.For
> public void longSim()
> Thread.Sleep(5000);

I would strongly advice to make longSim asynchronous by using await Task.Delay(5000) and switch to Parallel.ForEachAsync which supports async methods. This also will allow to use async versions of EF Core methods.

One more thing which can be worth taking into consideration is thread pool starvation which can sometimes have somewhat similar "side" effects but if the only change you made is the switch to EF Core instead SQLClient and it leads to the observed behaviour then thread pool starvation should not be the reason.

huangapple
  • 本文由 发表于 2023年3月4日 06:11:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632281.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定