3

i have a df with tresholds and profits:

import numpy as np import pandas as pd

dates = pd.date_range('20130101',periods=6) 
df = pd.DataFrame(np.random.randn(6,3),index=dates,columns=['Value1', 'Value2', 'Profit']) 
df['Profit'] = df['Profit']*100 


print(df.to_string())

total_profit = df['Profit'].loc[(df.Value1 > 0) & (df.Value2 >= 0)].sum()

print(total_profit)

is there a panda-way to optimize total_profit by finding the best fitting margings for the tresholds of filtering value1 and value2?

I mean i could loop over the DF and increase / decrease the filter-values until i find the best fitting value ... but i guess someone has already done this ... maybe sci-py?

so i basically need a function returning the best fits for value1 and value2, so i can filter my DF and optimize total_profit. the assumption is, that there is a correlation between value1, value2 and profit.

thanks and best wishes, e.

5
  • Could you give an example input and the output you would like to get? I don't understand very well what you are trying to achieve... You want a couple of thresholds for Value1 and Value2 such that when you filter your data by those thresholds the sum of Profit is maximized? Commented Aug 1, 2017 at 8:30
  • @jdehesa so the example above is basically the in- and output. df is my input dataframe which has n values and a profit column. the problem here ist, that this is random data in my example. my real data have a correlation between the two values and the profit. So i'm looking for a way to find the best possible fit for my Value1 and Value2 filtering to maximize total_profit ... as you said. Thanks!! Commented Aug 1, 2017 at 11:24
  • Given that your 2 margins and your profit are completely random, I don't think there is an easier/smarter/faster way, as this is not in any way a smooth mathematical function. If your example is, in that sense, incorrect, and your real data does have a relationship with the two margins, we'd have to know that relation. Commented Aug 1, 2017 at 11:54
  • If you're just thinking of using a built-in Pandas method to brute-force find the solution to your problem, instead of manually coding a (double) for loop: I don't know of one, but DataFrame.apply() may be a first step to look at. Commented Aug 1, 2017 at 11:55
  • I know that its hart to tell because of the random values. but nevertheless should it be possible to find the best fit (i guess :)). i recall that i did something like that with excels linear optimization years back ... so i though that there might be something similar in python. i would have expected that i get at least 2 values (value1 and value2) which would subselect the df in one line where i have a positive profit. Commented Aug 1, 2017 at 12:17

1 Answer 1

2

Assuming that you only want to use observed values for df.Value1 and df.Value2, the following will work.

import numpy as np 
import pandas as pd

dates = pd.date_range('20130101',periods=6) 
df = pd.DataFrame(np.random.randn(6,3),index=dates,columns=['Value1', 'Value2', 'Profit']) 
df['Profit'] = df['Profit']*100 

print(df.to_string())

# create list of all possible value pairs
vals = [[i,j] for i in df.Value1 for j in df.Value2]

# create list of profits from all possible value pairs
total_profit = [df['Profit'].loc[(df.Value1 > i) & (df.Value2 >= j)].sum() for i, j in vals]

# get index of maximum profit
max_index = total_profit.index(max(total_profit))

# get values that correspond to max profit
vals[max_index]

Out[9]: [-0.51914224014959032, -0.73918945103973344]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.