0

I have an array of arrays and a matching array. Each array has unique id values.

MatchingArray = [1,2,3,4,5,6]

A1 = [1, 4, 6]

A2 = [2,3,5]

A3 = [1,5]

A4 = [4]

A5 = [1, 6]

Need to find "optimal matchings". An optimal matching is an array of subsets from A1-A5 with minimal length, which should have a maximum possible intersection with MatchingArray.

For this example there are 2 possible matchings with a maximum intersection: M1 = [[2,3,5], [1, 4, 6]] and M2 = [[1,5], [4], [1, 6]]. But M1.length < M2.length, so the algorithm should output M1.

6
  • If need more description, please leave comment under answer Commented Apr 15, 2017 at 17:22
  • Maybe question was bad described, but the whole meaning of algorithm is find M1. I define M1 and M2 for explaining optimal solution., not because they are known in algorithm. Commented Apr 15, 2017 at 17:31
  • You need to find to more intersects array from A1-A5 with MatchingArray ? Commented Apr 15, 2017 at 17:34
  • I need best intersection with MatchingArray from given arrays. Best intersection is array of subarrays from A1-A5 arrays. Commented Apr 15, 2017 at 17:50
  • I implement desired by you algorithm. Check it :) Commented Apr 15, 2017 at 18:52

2 Answers 2

1

You could use sets (or hashes, whatever the language calls them) to optimise the time efficiency.

Convert the target array to a set, and then subtract the selected source from it (i.e. removing common values). Keep doing this recursively until the target set is empty. Keep track of the best result (using the fewest source arrays as possible). Backtrack if the number of source arrays being used gets past the length of the best solution already found at that moment.

Here is the code in Python:

def find_optimal_coverage(target, sources):
    max_size = len(target)
    best = None

    def recurse(target, sources, selected):
        nonlocal max_size, best
        if len(target) == 0:
            best = selected
            max_size = len(best) - 1
            return True
        if len(selected) == max_size:
            return None
        for i, source in enumerate(sources):
            result = recurse(target - set(source), sources[i+1:], 
                             selected + [list(source)])
            if result:
                return True

    target = set(target) # convert to set for faster lookup
    # limit the source lists to elements that occur in the target
    sources = list(map(target.intersection, sources))
    # limit target to elements that occur in at least one source
    target = set.union(*sources)
    # sort sources by decreasing length to maximise probability of 
    # finding optimal solution sooner
    sources.sort(key = len, reverse = True)
    if recurse(target, sources, []):
        return best

result = find_optimal_coverage(
    [1, 2, 3, 4, 5, 6, 8], 
    [
        [1, 4, 6, 7],
        [2, 3, 5],
        [1, 5],
        [4],
        [1, 6]
    ]
)
print(result)

See it run on repl.it

In JavaScript:

function subtractArray(s, arr) {
    return arr.reduce( (s, v) => (s.delete(v), s), new Set(s) );
}

function findOptimalCoverage(target, sources) {
    var maxSize = target.size;
    var best = null;
    
    function recurse(target, sources, selected) {
        if (target.size == 0) {
            best = selected;
            maxSize = best.length - 1;
            return true;
        }
        if (selected.length == maxSize) return;
        return sources.some( (source, i) =>
            recurse(subtractArray(target, source), sources.slice(i+1), 
                    selected.concat([source]))
        );
    }
    target = new Set(target) // convert to set for faster lookup
    // limit the source arrays to elements that occur in the target
    sources = sources.map( source => source.filter(target.has.bind(target)));
    // limit target to elements that occur in at least one source
    target = new Set([].concat(...sources)); 
    // sort sources by decreasing length to maximise probability of 
    // finding optimal solution sooner
    sources.sort( (a,b) => b.length - a.length );
    if (recurse(target, sources, [])) return best;
}

var result = findOptimalCoverage(
    [1, 2, 3, 4, 5, 6, 8], 
    [
        [1, 4, 6, 7],
        [2, 3, 5],
        [1, 5],
        [4],
        [1, 6]
    ]
);
console.log(result);
.as-console-wrapper { max-height: 100% !important; top: 0; }

Sign up to request clarification or add additional context in comments.

2 Comments

Great answer, after some adjustments this algorithm gives what I really want. Thanks.
@VahagnNikoghosian That algrorithm doesn't work correctly on arrays, that contain non intersections elements and have the same count of intesection elements, does it? For example if you try test array [1, 4, 6, 7]
1

Implemented algorithm in javascript:

var matchingArray = [1, 2, 3, 4, 5, 6];

var A1 = [1, 4, 6],
  A2 = [2, 3, 5],
  A3 = [1, 5],
  A4 = [4],
  A5 = [1, 6];


var M = [A1, A2, A3, A4, A5];

function compareArrays(M, machingArray) {
  var intersections = []
  M.forEach(function(A) {
    var partOfItersections;
    if (A.length > 0) {
      var intersectionsCount = getIntersectionCount(A, machingArray);
      partOfItersections = intersectionsCount / A.length;
    } else {
      partOfItersections = 0
    }

    intersections.push({
      length: A.length,
      partOfItersections: partOfItersections
    });
  });


  //alert(JSON.stringify(intersections));

  var maxLength = 0,
    maxPartOfItersections = 0,
    optimalArrays = [];

  intersections.forEach(function(arrayData, index) {
    var currentArr = M[index];
    var currentArrLength = currentArr.length;
    if (maxPartOfItersections < arrayData.partOfItersections) {

      setCurrentOptimalArr(arrayData.partOfItersections, currentArr);
    } else if (maxPartOfItersections === arrayData.partOfItersections) {
      if (maxLength < currentArrLength) {
        setCurrentOptimalArr(arrayData.partOfItersections, currentArr);
      } else if (maxLength === currentArrLength) {
        optimalArrays.push(currentArr);
      }
    }
  });

  //alert(JSON.stringify(optimalArrays));

  return optimalArrays;

  function setCurrentOptimalArr(intersectionsCount, currentArr) {
    optimalArrays = [currentArr];
    maxLength = currentArr.length;
    maxPartOfItersections = intersectionsCount;
  }

  function getIntersectionCount(A, machingArray) {
    var intersectionCount = 0;

    A.forEach(function(elem) {
      if (machingArray.indexOf(elem) != -1) {
        intersectionCount++;
      }
    });


    return intersectionCount;
  }
}

alert(JSON.stringify(compareArrays(M, matchingArray)));

  1. Count intersection of arrays separately.
  2. Return arrays which contain more part of intersections.

Code updated

2 Comments

Try using a spelling checker.
Sorry for my separatly. I forget how it write correctly. You can edit it and fix.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.