1

I was creating this function:

    def stand_col_names(*df_to_stand):
        '''function that allow you to lowercase dataframes columns'''
        df_to_stand.columns = df_to_stand.columns.str.lower()
        return df_to_stand

As you can see my goal is to pass multiple dataframes simultaneously in order to convert columns names. Something like this:

df1,df2,df3,df4 = stand_col_names(df1,df2,df3,df4)

I dont' wanna a function that take only one argument and therefore write four rows, one for each dataframe.

When I run it I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-c4c8e2ccc0f3> in <module>
----> 1 df_target_pm,df_target_sp=stand_col_names(df_target_pm,df_target_sp)

<ipython-input-18-65eb087bc145> in stand_col_names(*df_to_stand)
  1 def stand_col_names(*df_to_stand):
  2     '''function that allow you to lowercase dataframes columns'''
----> 3     df_to_stand.columns = df_to_stand.columns.str.lower()
  4     return df_to_stand

AttributeError: 'tuple' object has no attribute 'columns'

Could you help me please?

1
  • 2
    I don't see the need to use this kind of organization. I think * is better suited when you want to do an operation that could depend on a variable number of things. For instance, imagine you wanted to sum together a variable number of arrays. Here, though you want to act on a variable number of DataFrames the result only ever depends on the individual DataFrames themselves. IMO, the function should accept and return a single DataFrame and the loop should exist outside. i.e for df in dfs: df = stand_col_names(df) Commented Nov 8, 2019 at 16:51

2 Answers 2

1

Actually since you are modifying the DataFrame's attribute, you don't need to do any return at all:

def stand_col_names(*df_to_stand):
    '''function that allow you to lowercase dataframes columns'''
    for df in df_to_stand:
        df.columns = df.columns.str.lower()

# to call, just do:
stand_col_names(df1, df2, df3, df4)

But in general, I agree with @ALollz's comment. This function should be for a single Dataframe, and the loop should exist outside:

def stand_col_names(df):
    df.columns = df.columns.str.lower()

for df in (df1, df2, df3, df4):
    stand_col_names(df)
Sign up to request clarification or add additional context in comments.

Comments

0
def stand_col_names(*dataframes):
    for df in dataframes:
        df.columns = df.columns.str.lower()

    return dataframes

Some explanation. The * operator (I don't actually know what it's usually called in Python, but elsewhere it's called the "spread" operator) collects all the arguments into a tuple. The for loop iterates through the tuple and mutates their values. It then returns the tuple.

As a word of warning, this will mutate the original dataframe in place, which you might not want it to do. If you want to keep the original dataframe with its uppercase columns, you'll need instead to iterate through the collection, make a copy into a second tuple, and return the second tuple.

4 Comments

This would actually return the first dataframes without changing the rest.
Can you explain what you mean by this? Do you mean it will return the unmodified dataframes? It won't, because the tuple contains a reference to the original dataframes, which it modifies in place and then returns (which I agree is an unusual thing to do). Or do you mean something else?
Your return dataframes is inside the loop.
Ah yes, so it is.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.