Parallelizing recursive algorithms is not easy. It's not obvious how to partition the workload, when the function that does the work can create more work recursively. In this particular case it is possible to partition the work by splitting the numbers into two parts, and using one of the two parts as the partitions. Then for each partition search for combinations in the other part that have a sum equal to the target minus the sum of the partition. Below is an implementation of this idea. The PLINQ library is used for the parallelization:
static IEnumerable<IList<int>> FindSubsets(IEnumerable<int> source, int target)
{
const int leftSize = 8; // Results in 256 partitions
int[] sourceArray = source.ToArray();
int[] left = sourceArray.Take(leftSize).ToArray();
int[] right = sourceArray.TakeLast(sourceArray.Length - left.Length).ToArray();
return Partitioner
.Create(Combinations(left), EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.WithDegreeOfParallelism(Environment.ProcessorCount)
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.SelectMany(leftPart =>
{
int leftPartSum = leftPart.Sum();
return FindSubsets_Internal(right, target - leftPartSum).
Select(rightPart => leftPart.Concat(rightPart).ToArray());
});
static IEnumerable<int[]> Combinations(IEnumerable<int> remaining,
IEnumerable<int> partial = null)
{
partial ??= Enumerable.Empty<int>();
yield return partial.ToArray();
foreach (var (n, i) in remaining.Select((n, i) => (n, i)))
{
var innerRemaining = remaining.Skip(i + 1);
var inner = Combinations(innerRemaining, partial.Append(n));
foreach (var subset in inner) yield return subset;
}
}
static IEnumerable<int[]> FindSubsets_Internal(IEnumerable<int> remaining,
int target, IEnumerable<int> partial = null, int partialSum = 0)
{
partial ??= Enumerable.Empty<int>();
if (partialSum == target) yield return partial.ToArray();
if (partialSum >= target) yield break;
foreach (var (n, i) in remaining.Select((n, i) => (n, i)))
{
var innerRemaining = remaining.Skip(i + 1);
var inner = FindSubsets_Internal(
innerRemaining, target, partial.Append(n), partialSum + n);
foreach (var subset in inner) yield return subset;
}
}
}
This implementation is not much faster than the original single-thread method, because the recursive algorithm is quite memory-hungry and inefficient. Comparing the value of the GC.GetTotalAllocatedBytes(true) before and after the execution reveals that the memory allocation is massive: around 700 MB per second in my PC. So instead of the bottleneck being the CPU, it's the bandwidth of the memory bus. The CPU cores are mostly waiting for data to be stored or be fetched from the main memory, instead of calculating. At least this is my theory. For comparison parallelizing the non-recursive algorithm that I've posted in my other answer, yields reasonable performance improvements (around 2.4x on my quad core PC).
Skip()is issued it does an internal loop. Fix the single threaded version first and then look into parallelizing it.