0

How can I run the above code in the fastest way. What is the best practice?

public ActionResult ExampleAction()
        {
            // 200K items
            var results = dbContext.Results.ToList();

            foreach (var result in results)
            {
                // 10 - 40 items
                result.Kazanim = JsonConvert.SerializeObject(
                    dbContext.SubTables // 2,5M items
                    .Where(x => x.FooId == result.FooId)
                    .Select(select => new
                    {
                        BarId = select.BarId,
                        State = select.State,
                    }).ToList());

                dbContext.Entry(result).State = EntityState.Modified;
                dbContext.SaveChanges();
            }

            return Json(true, JsonRequestBehavior.AllowGet);
        }

This process takes an average of 500 ms as sync. I have about 2M records. The process is done 200K times.

How should I code asynchronously?

How can I do it faster and easier with an async method.

6
  • 3
    How can I do it faster and easier with an async method - Using async won't make it faster. A single run of this method will actually end up slightly slower. However, async allows your application to handle more requests at the same time, making your application more responsive overall. Commented Dec 1, 2022 at 15:11
  • Your question seems to suggest that you haven't done much of your own research in how to use asynchronous programming. So start by reading Microsoft's documentation on it and you can come back if you have a specific question: Asynchronous programming with async and await Commented Dec 1, 2022 at 15:13
  • In addition to the excellent comments already given, look into doing some of the data filtering on the database side, stored procedure, ... . Commented Dec 1, 2022 at 15:47
  • 1
    You can't make slow code run faster by running it in yet another thread. The question's code executes 2-4M loads and updates. That's most definitely not a good case for ORMs. This is a pure ETL job, best done in SQL. If you have to use client code, don't use ORMs or use a lightweight micro-ORM like Dapper Commented Dec 1, 2022 at 15:49
  • Which database are you using? You could replace all this with an UPDATE that calculates the JSON string and stores it. Assuming there's any benefit to generating such a string - it won't make querying easier or faster than joining with the related table. Few databases can index the contents of JSON columns Commented Dec 1, 2022 at 15:54

4 Answers 4

2

Here are two suggestions that can improve the performance multiple orders of magnitude:

  1. Do work in batches:

    1. Make the client send a page of data to process; and/or
    2. In the web server code add items to a queue and process them separately.
  2. Use SQL instead of EF:

    1. Write an efficient SQL; and/or
    2. Use the stored proc to do the work inside the db rather than move data between the db and the code.
Sign up to request clarification or add additional context in comments.

Comments

1

There's nothing you can do with that code asynchronously for improving its performance. But there's something that can certainly make it faster.

If you call dbContext.SaveChanges() inside the loop, EF will write back the changes to the database for every single entity as a separate transaction. Move your dbContext.SaveChanges() after the loop. This way EF will write back all your changes at once after in one single transaction.

Always try to have as few calls to .SaveChanges() as possible. One call with 50 changes is much better, faster and more efficient than 50 calls for 1 change each.

Comments

0

and welcome.

There's quite a lot I see incorrect in terms of asynchronicity, but I guess it only matters if there are concurrent users calling your server. This has to do with scalability and the thread pool in charge of spinning up threads to take care of your incoming HTTP requests.

You see, if you occupy a thread pool thread for a long time, that thread will not contribute to dequeueing incoming HTTP requests. This pretty much puts you in a position where you can spin up a maximum of around 2 new thread pool threads per second. If your incoming HTTP request rate is faster than the pool's ability to produce threads, all of your HTTP requests will start seeing increased response times (slowness).

So as a general rule, when doing I/O intensive work, always go async. There are asynchronous versions of most (or all) of the materializing methods like .ToList(): ToListAsync(), CountAsync(), AnyAsync(), etc. There is also a SaveChangesAsync(). First thing I would do is use these under normal circumstances. Yours don't seem to be, so I mentioned this for completeness only.

I think that you must, at the very least, run this heavy process outside the thread pool. Use Task.Factory.StartNew() with the TaskCreationOptions.LongRunning but run synchronous code so you don't fall in the trap of awaiting the returned task in vain.

Now, all that just to have a "proper" skeleton. We haven't really talked about how to make this run faster. Let's do that.

Personally, I think you need some benchmarking between different methods. It looks like you have benchmarked this code. Now listen to @tymtam and see if a stored procedure version runs faster. My hunch, just like @tymtam's, is that it will be definitely faster.

If for whatever reason you insist in running this with C#, I would parallelize the work. The problem with this is Entity Framework. As per usual, my very popular, yet unfriendly ORM, is giving us a big but. EF's DB context works with a single connection and disallows multiple simultaneous queries. So you cannot parallelize this with EF. I would then move to my good, amazing friend, Dapper. Using Dapper, you could divide the workload in threads, and each thread would do an independent DB connection, and through that connection, take care of a portion of the 200K result set you obtain at the beginning.

Comments

0

Thanks for the valuable information you provided.

I decided to use hangfire in line with your suggestions.

I used it with Hangfire Inmemory. I have prepared a function that will throw it into the hangfire queue in the foreach. After getting my relevant values before starting the foreach, I set my function to import parameters that it will calculate and save to the database. I won't prolong it.

A job that took 30 minutes on average fell to 3 minutes with hangfire. Maybe it's still not ideal, but it has worked for me now. Instead of making the user wait, I can show your action as currently in progress. I end the process with a warning that another job has been successfully completed before the end of the last thread.

I haven't used it here for Dapper for now. But I used it on another subject. It really has tremendous performance compared to Entity Framework.

Thanks again.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.