1

Thank you in advance for your assistance.

#Create df.
import pandas as pd 


d = {'dep_var' : pd.Series([10, 20, 30, 40], index =['a', 'b', 'c', 'd']), 
      'one' : pd.Series([9, 23, 37, 41], index =['a', 'b', 'c', 'd']),
       'two' : pd.Series([1, 6, 5, 4], index =['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d) 

print(df)

   dep_var  one  two
a       10    9    1
b       20   23    6
c       30   37    5
d       40   41    4


#Define function.

def df_two(dep_var, ind_var_1, ind_var_2):

    global two

    data = {
        dep_var: df[dep_var],
        ind_var_1: df[ind_var_1],
        ind_var_2: df[ind_var_2]
    
    }


    two = pd.DataFrame(data)
    return two

# Execute function.

df_two("dep_var", "one", "two")


dep_var one two
a   10  9   1
b   20  23  6
c   30  37  5
d   40  41  4

Works perfect. I'd like to, fairly new at this, be able to use a single function when using say three or four parameters, of course, using the above code I get error message with third parameter.

So rookie move I define another function with 3 parameters.

def df_three(dep_var, ind_var_1, ind_var_2, ind_var_3):

    global three

    data = {
        dep_var: df[dep_var],
        ind_var_1: df[ind_var_1],
        ind_var_2: df[ind_var_2],
        ind_var_3: df[ind_var_2]
    
    }


    three = pd.DataFrame(data)
    return three

I've tried *args, *kargs, mapping and host of things with no luck. My sense is I'm close but need a way to tell the function that sometimes there might be one, two, or three parameters, and then map one, two or three parameters to created dataframe.

1
  • how are you calling df_three? Commented Nov 16, 2020 at 19:08

2 Answers 2

1

Use unpack *args:

def foo(dep_var, *args):
    global df

    data = {dep_var: df[dep_var]}
    for a in args:
        data[a] = df[a]
    
    return pd.DataFrame(data)

And then you can call

foo('dep_var', 'one')

foo('dep_var', 'one', 'two')

To eliminate the need of global argument, I'd pass df to the function as well:

def foo(df, dep_var, *args):
    data = {dep_var: df[dep_var]}
    for a in args:
        data[a] = df[a]
    
    return pd.DataFrame(data)

More information on *args.

Sign up to request clarification or add additional context in comments.

Comments

0

It sounds like you want to select only some columns from a data frame, in a certain order. You can just pass a list of the column names for that:

two[["dep_var", "one", "two"]]

If you want to, you can pack that into a function, using tuple unpacking to have a variable number of arguments.

def select(df, *columns):
    return df[list(columns)]

This should directly work with your use cases:

select(two, "dep_var", "one", "two")
select(three, "dep_var", "one", "two", "three")

Note that I also passed the data frame variable, so you don't need to rely on a global variable.

The call to list is needed, because tuple unpacking produces, well, a tuple. And using a tuple as an index to the data frame produces different results than using a list.

You might want to append a .copy() to the return line, depending on how you use the return value of this.

A variable number of arguments also includes zero, so you might want to add a check for that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.