4

I'm looking for a way to implement a diversified sort. Each cell contains a weight value along with an enum type. I would like to sort it in a way that it will make the weight value dynamic according to the types of elements that were already chosen, giving priority to those 'less chosen' so far. I would like to control the diversity factor, so that when setting it with a high value, it'll produce a fully diverse results array, and when giving a low value it will provide an almost 'regular' sorted array.

This doesn't sound like a very specific use case, so if there are any references to known algorithms, that will also be great.

Update: According to Ophir suggestion, this might be a basic wrapper:

    // these will be the three arrays, one per type
    $contentTypeA, $contentTypeB, $contentTypeC;

    // sort each by value
    sort($contentTypeA);
    sort($contentTypeB);
    sort($contentTypeC);

    // while i didn't get the amount I want or there aren't any more options to chose from 
    while ($amountChosen < 100 && (count($contentTypeA) + count($contentTypeB) + count($contentTypeC) > 0)) {

        $diversifiedContent[] = selectBest($bestA, $bestB, $bestC, &$contentTypeA, &$contentTypeB, &$contentTypeC);

        $amountChosen++;
    }

    $diversifiedContent = array_slice($diversifiedContent, 0, 520);

    return $diversifiedContent;
}

function selectBest($bestA, $bestB, $bestC, &$contentTypeA, &$contentTypeB, &$contentTypeC) {
    static $typeSelected;
    $diversifyFactor = 0.5;

    if (?) {
        $typeSelected['A']++;
        array_shift($contentTypeA);
        return $bestA;
    }
    else if (?) {
        $typeSelected['B']++;
        array_shift($contentTypeB);
        return $bestA;
    }
    else if (?) {
        $typeSelected['C']++;
        array_shift($contentTypeC);
        return $bestA;
    }
}
2
  • 2
    Could you please add example how it should looks before and after the sort Commented Apr 3, 2014 at 7:35
  • @Mysterion Not really an example array, but added a base for the implementation. Helps? Commented Apr 3, 2014 at 12:26

2 Answers 2

2

Your definition is very general terms, not in mathematical terms, so I doubt if you can find a close solution that matches exactly what you want. I can suggest this simple approach:

Sort each type separately. Then merge the lists by iteratively taking the maximum value in the list of highest priority, where priority is the product of the value and a "starvation" factor for that type. The starvation factor will be a combination of how many steps ignored that type, and the diversity factor. The exact shape of this function depends on your application.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks. Updated the question with code that represents your suggestion. Can you suggest good mathematical conditions for the "????" conditions that should take the starvationFactor and typeSelected variables?
There are several mistakes in your implementation -
Was just trying to put your words into pseudo code to make sure we're understanding each other, if it's 'typos' disregard. I misunderstood the approach you suggested?
Sorry the last comment was sent incomplete. Problems in the pseudocode: :You don't remove the selected item, and you don't count the number of starvation steps for each type, i.e. how many turns since this type was selected. To your question: SelectBest() will choose the one with highest val*stravation_fact where starvation fact is for instance (1+diversifyFactor)^number_of_starvation_steps
No. You can have an array starvation_count indexed by type, when you select type t you increase all the counts by one, and starvation_count[t]=0
|
1

Heres an idea:

class item(object):
    def __init__(self, enum_type, weight):
        self.enum_type = enum_type
        self.weight = weight
        self.dyn_weight = weight

    def __repr__(self):
        return unicode((self.enum_type, self.weight, self.dyn_weight))


def sort_diverse(lst, factor):
    # first sort
    by_type = sorted(lst, key=lambda obj: (obj.enum_type, obj.weight))
    cnt = 1
    for i in xrange(1, len(lst)):
        current = by_type[i]
        previous = by_type[i-1]
        if current.enum_type == previous.enum_type:
            current.dyn_weight += factor * cnt
            cnt += 1
        else:
            cnt = 1
    return sorted(by_type, key=lambda obj: (obj.dyn_weight, obj.enum_type)) 

Try this example:

lst = [item('a', 0) for x in xrange(10)] + [item('b', 1) for x in xrange(10)] + [item('c', 2) for x in xrange(10)]
print sort_diverse(lst, 0) # regular sort
print sort_diverse(lst, 1) # partially diversified
print sort_diverse(lst, 100) # completely diversified

Depending on your needs, you might want to use a more sophisticated weight update function.

This algorithm is basically O(nlogn) time complexity and O(n) space complexity as it requires two sorts and two copies of the list.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.