scipy.optimize on pandas dataframe

Question

I tried to search it but with poor result.

Can someone please explain to me how to perform optimize.minimize on Pandas DataFrame so minimized is the error between categories in DataFrame and result column

Consider this example:

import pandas as pd

df = pd.DataFrame({'prod': ['prod1', 'prod2', 'prod3', 'prod4', 'prod5', 'prod6'],
                   'cat': ['cat1', 'cat1', 'cat2', 'cat2', 'cat3', 'cat1'],
                   'dog': ['dog1', 'dog2', 'dog1', 'dog2', 'dog2', 'dog3'],
                   'result': [20, 10, 30, 50, 45, 120]})

for each cat1, cat2, cat3, dog1, dog2 and dog3 I want to find values that minimize this equation:

import numpy as np

np.average(np.abs(df['result'] - ('min for values in cat column * min for values in dog column'))) / np.average(df['result'])

I am able to replicate this in Excel using Solver

prod    cat     dog result  cat*dog abs
prod1   cat1    dog1    20  17.38   2.61
prod2   cat1    dog2    10  27.34   17.35
prod3   cat2    dog1    30  26.91   3.09
prod4   cat2    dog2    50  42.32   7.67
prod5   cat3    dog2    45  45.00   0.00
prod6   cat1    dog3    120 20.64   99.36

so the end score that I am trying to find is:

average abs of 22 / average result of 45.83 = 0.47

These are the values Solver returned for animals:

cat1    3.59194254
cat2    5.559980313
cat3    5.91078751
dog1    4.840109868
dog2    7.613201994
dog3    5.746396256

How do I replicate this in Python?

CJR · Accepted Answer · 2018-10-16 15:40:42Z

4

You need to define a function that optimize.minimize can run (so that it knows what it's trying to minimize).

import pandas as pd
import numpy as np
from scipy import optimize

df = pd.DataFrame({'prod': ['prod1', 'prod2', 'prod3', 'prod4', 'prod5', 'prod6'],
                   'cat': ['cat1', 'cat1', 'cat2', 'cat2', 'cat3', 'cat1'],
                   'dog': ['dog1', 'dog2', 'dog1', 'dog2', 'dog2', 'dog3'],
                   'result': [20, 10, 30, 50, 45, 120]})

So let's define the animal_error function as you've described - the first argument is a 1d array with some number of values (as required by optimize). The second argument is the corresponding strings for those array values, and the third argument is your dataframe. Most of this code is just turning your dataframe strings into values that can be calculated.

def animal_error(val, animal, df):
    assert len(val) == len(animal)
    lookup = dict()
    for i in range(len(val)):
        lookup[animal[i]] = val[i]
    df = df.replace(lookup)
    error = np.abs(df['result'] - np.multiply(df['cat'], df['dog']))
    return np.mean(error) / np.mean(df['result'])

Now, you can make the strings into an array:

animals = np.concatenate([df['dog'].unique(), df['cat'].unique()])

Set a reasonable initial value for the solver:

initial = np.repeat(np.sqrt(df['result'].mean()), animals.size)

And run the minimizer:

res = optimize.minimize(animal_error, args=(animals, df), x0=initial, method = 'Nelder-Mead', options={'maxiter':10000})
res_df = pd.DataFrame({'animal': animals, 'min_val':res.x})

End result follows:

>>> res.fun
0.08676411624175694

  animal    min_val
0   dog1   3.754194
1   dog2   5.296533
2   dog3  22.526566
3   cat1   5.327044
4   cat2   9.307979
5   cat3   8.496109

I think that your cost-function description might be a bit off, so you may have to adjust it.

answered Oct 16, 2018 at 15:40

CJR

3,9872 gold badges13 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jerzy Over a year ago

How one determines 'reasonable initial value'? BTW Thank you! :)

CJR Over a year ago

@Jurek Since your cost function is (r - xy)/r, and the ideal for the cost function is 0, I figured (r - x0*x0)/r = 0 (x0 = sqrt(r)) would be a good place to start. Honestly, ask 10 people, get 10 different answers to this question.

user11476329 Over a year ago

I have a very similar problem, would you be able to assist me in using the optimiser function but trying to maximise a function (-fun)?

Collectives™ on Stack Overflow

scipy.optimize on pandas dataframe

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related