1

Here's some code just to hopefully make clear the situation I'm talking about:

public class Processor
{
    private readonly IRepository _repo; 
    private readonly IApiSrevice _apiService
    private readonly _mapper;
    
    public Processor(IRepository repo, IApiSrevice apiService, IMapper mapper)
    {
        _repo = repo;
        _apiService = apiService
        _mapper = mapper;
    }

    public async Task<IEnumerable<Thing>> ProcessStuff(IEnumerable<MyDto> dtos)
    {
        var people = await _apiService.GetPeople();
        
        ConcurrentBag<Location> things = new();
        var options =  new ParallelOptions { MaxDegreeOfParallelism = 3 };
        await Parallel.ForEachAsync(people, options, async(person, token ) =>
        {
            var locations = await _apiService.GetLocations(person.Id);
            
            IEnumerable<Thing> newThings = _mapper.Map(locations);      
            
            // maybe there's a repo call in here somewhere
            // _repo.AddThings(newThings);
            

            foreach(var thing in newThings)
            {
                things.Add(thing)
            }
            
        });
        
        return things;
    }
}

I think that just because of the nature of interfaces (hidden implementations) calling any method on one from within a Parallel loop is a bad idea: implementations might have methods that aren't thread-safe.

If so, how can I call out to methods on interfaces? I've done quite a bit of testing, both with Parallel.ForEachAsync() and a standard foreach loop, and I get identical results, but I'm not sure this is something I can count on. Running with the Parallel loop and 6 degrees of parallelism takes significantly less time, though.

5
  • 2
    I don't think there is a 100% right or wrong answer - it is contextual. However, in the general case, when you don't know what might be going on behind the abstraction, or how things interact, yes: I'd agree it is a bad idea to introduce concurrency - this could cause objects that are reachable via multiple paths to get touched by two threads in ways that are unpredictable, or it could just have weird unpredictable performance; but if you know that the operations are isolated: yes, it can make perfect sense to go concurrent Commented Oct 9, 2023 at 14:51
  • 1
    What matters is what you do, not if you do it through an interface. Using parallel execution to speed up bad database queries will cause more delays for example. Making 6 concurrent HTTP requests is faster assuming the server/service can handle the extra load. Too high a DOP may get you throttled. Commented Oct 9, 2023 at 15:12
  • For example, executing 6 times the query SELECT * FROM Person where UnindexedID=@id will cause a full table scan 6 times, taking shared locks on the entire table, blocking modifications and probably causing deadlocks. SELECT ... WHERE unindexedid in (@id1, @id2,...,@id6) will only do so once. If the ID field is indexed, IN is still better because the overhead of 6 connections is probably higher than the cost of the query itself. Commented Oct 9, 2023 at 15:18
  • Is your question more focused on the thread-safety aspect of calling APIs defined on interfaces instead of concrete classes, or on the performance aspect? Commented Oct 9, 2023 at 15:21
  • 1
    @TheodorZoulias: I meant for my question to be more focused on the thread safety aspects of using interfaces rather than the performance aspects. I only mentioned the performance info to head off any "do you actually get any benefit from parallelism?" questions. Commented Oct 9, 2023 at 18:33

1 Answer 1

3

Interfaces are just a way to abstract a contract, as any abstraction it can become leaky so you might need to dig deeper into the implementation.

In this particular case they are not that different from any functionality encapsulation - it does not matter if you are calling a method defined on interface or in some class you still need to understand what it does and how it works or at least what concurrency guarantees it provides if you want to use it in potentially multithreaded context (i.e. Parallel.ForEachAsync in this case). Also (assuming you want to have some perfromance gains) you definitely will need to know how the actual implementation works to understand how much can be gained from parallelization.

One option to care a bit less about internal workings of interface implementation is to create a DI scope (assuming you are using that) per handler - for example by injecting IServiceScopeFactory and using to create scope and resolve dependencies (also can be encapsulated into some "iteration handler"), though in general it is still recommended to understand what the implementation does.

P.S.

  • ConcurrentBag might not be the best option to use here.

  • I've done quite a bit of testing, both with Parallel.ForEachAsync() and a standard foreach loop and get identical results...

    TBH I would expect perfromance gains "even" for MaxDegreeOfParallelism set to 3 but without seeing actual implementation it is hard to tell.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.