3

I tried to search it but with poor result.

Can someone please explain to me how to perform optimize.minimize on Pandas DataFrame so minimized is the error between categories in DataFrame and result column

Consider this example:

import pandas as pd

df = pd.DataFrame({'prod': ['prod1', 'prod2', 'prod3', 'prod4', 'prod5', 'prod6'],
                   'cat': ['cat1', 'cat1', 'cat2', 'cat2', 'cat3', 'cat1'],
                   'dog': ['dog1', 'dog2', 'dog1', 'dog2', 'dog2', 'dog3'],
                   'result': [20, 10, 30, 50, 45, 120]})

for each cat1, cat2, cat3, dog1, dog2 and dog3 I want to find values that minimize this equation:

import numpy as np

np.average(np.abs(df['result'] - ('min for values in cat column * min for values in dog column'))) / np.average(df['result'])

I am able to replicate this in Excel using Solver

prod    cat     dog result  cat*dog abs
prod1   cat1    dog1    20  17.38   2.61
prod2   cat1    dog2    10  27.34   17.35
prod3   cat2    dog1    30  26.91   3.09
prod4   cat2    dog2    50  42.32   7.67
prod5   cat3    dog2    45  45.00   0.00
prod6   cat1    dog3    120 20.64   99.36

so the end score that I am trying to find is:

average abs of 22 / average result of 45.83 = 0.47

These are the values Solver returned for animals:

cat1    3.59194254
cat2    5.559980313
cat3    5.91078751
dog1    4.840109868
dog2    7.613201994
dog3    5.746396256

How do I replicate this in Python?

1 Answer 1

4

You need to define a function that optimize.minimize can run (so that it knows what it's trying to minimize).

import pandas as pd
import numpy as np
from scipy import optimize

df = pd.DataFrame({'prod': ['prod1', 'prod2', 'prod3', 'prod4', 'prod5', 'prod6'],
                   'cat': ['cat1', 'cat1', 'cat2', 'cat2', 'cat3', 'cat1'],
                   'dog': ['dog1', 'dog2', 'dog1', 'dog2', 'dog2', 'dog3'],
                   'result': [20, 10, 30, 50, 45, 120]})

So let's define the animal_error function as you've described - the first argument is a 1d array with some number of values (as required by optimize). The second argument is the corresponding strings for those array values, and the third argument is your dataframe. Most of this code is just turning your dataframe strings into values that can be calculated.

def animal_error(val, animal, df):
    assert len(val) == len(animal)
    lookup = dict()
    for i in range(len(val)):
        lookup[animal[i]] = val[i]
    df = df.replace(lookup)
    error = np.abs(df['result'] - np.multiply(df['cat'], df['dog']))
    return np.mean(error) / np.mean(df['result'])

Now, you can make the strings into an array:

animals = np.concatenate([df['dog'].unique(), df['cat'].unique()])

Set a reasonable initial value for the solver:

initial = np.repeat(np.sqrt(df['result'].mean()), animals.size)

And run the minimizer:

res = optimize.minimize(animal_error, args=(animals, df), x0=initial, method = 'Nelder-Mead', options={'maxiter':10000})
res_df = pd.DataFrame({'animal': animals, 'min_val':res.x})

End result follows:

>>> res.fun
0.08676411624175694

  animal    min_val
0   dog1   3.754194
1   dog2   5.296533
2   dog3  22.526566
3   cat1   5.327044
4   cat2   9.307979
5   cat3   8.496109

I think that your cost-function description might be a bit off, so you may have to adjust it.

Sign up to request clarification or add additional context in comments.

3 Comments

How one determines 'reasonable initial value'? BTW Thank you! :)
@Jurek Since your cost function is (r - xy)/r, and the ideal for the cost function is 0, I figured (r - x0*x0)/r = 0 (x0 = sqrt(r)) would be a good place to start. Honestly, ask 10 people, get 10 different answers to this question.
I have a very similar problem, would you be able to assist me in using the optimiser function but trying to maximise a function (-fun)?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.