2

I am creating a function. One input of this function will be a panda dataframe and one of its tasks is to do some operation with two variables of this dataframe. These two variables are not fixed and I want to have the freedom to determine them using parameters as inputs of the function fun.

For example, suppose at some moment the variables I want to use are 'var1' and 'var2' (but at another time, I may want to use others two variables). Supose that these variables take values 1,2,3,4 and I want to reduce df doing var1 == 1 and var2 == 1. My functions is like this

def fun(df , var = ['input_var1', 'input_var2'] , val):
    df = df.rename(columns={  var[1] : 'aux_var1 ', var[2]:'aux_var2'})

    # Other operations
    df  = df.loc[(df.aux_var1 == val ) & (df.aux_var2 == val )] 
    # end of operations

    # recover 
    df = df.rename(columns={ 'aux_var1': var[1] ,'aux_var2': var[2]})
    return df 

When I use the function fun, I have the error

fun(df, var = ['var1','var2'], val = 1)
IndexError: list index out of range

Actually, I want to do other more complex operations and I didn't describe these operations so as not to extend the question. Perhaps the simple example above has a solution that does not need to rename the variables. But maybe this solution doesn't work with the operations I really want to do. So first, I would necessarily like to correct the error when renaming the variables. If you want to give another more elegant solution that doesn't need renaming, I appreciate that too, but I will be very grateful if besides the elegant solution, you offer me the solution about renaming.

3 Answers 3

3

Python liste are zero indexed, i.e. the first element index is 0. Just change the lines:

df = df.rename(columns={  var[1] : 'aux_var1 ', var[2]:'aux_var2'})

df = df.rename(columns={ 'aux_var1': var[1] ,'aux_var2': var[2]})

to

df = df.rename(columns={  var[0] : 'aux_var1 ', var[1]:'aux_var2'})

df = df.rename(columns={ 'aux_var1': var[0] ,'aux_var2': var[1]})

respectively

Sign up to request clarification or add additional context in comments.

Comments

3

In this case you are accessing var[2] but a 2-element list in Python has elements 0 and 1. Element 2 does not exist and therefore accessing it is out of range.

1 Comment

When trying to solve a complicated problem, it can be easy to overlook the simple ones that you encounter along the way. Glad to help.
2

As it has been mentioned in other answers, the error you are receiving is due to the 0-indexing of Python lists, i.e. if you wish to access the first element of the list var, you do that by taking the 0 index instead of 1 index: var[0].

However to the topic of renaming, you are able to perform the filtering of pandas dataframe without any column renaming. I can see that you are accessing the column as an attribute of the dataframe, however you are able to achieve the same via utilising the __getitem__ method, which is more commonly used with square brackets, f.e. df[var[0]].

If you wish to have more generality over your function without any renaming happening, I can suggest this:

from functools import reduce

def fun(df , var, val):
    _sub = reduce(
                  lambda x, y: x & (df[y] == val), 
                  var, 
                  pd.Series([True]*df.shape[0])
                 )
    return df[_sub]

This will work with any number of input column variables. Hope this will serve as an inspiration to your more complicated operations you intend to do.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.