Correct use of map for mapping a function onto a df, python pandas

Question

Searching for awhile now and can't get anything concrete on this. Looking for a best practice answer. My code works, but I'm not sure if I'm introducing problems.

# df['Action'] = list(map(my_function, df.param1)) # Works but older 
    # i think?
df['Action'] = df['param1'].map(my_function)

Both of these produce the same VISIBLE result. I'm not entirely sure how the first, commented out line works, but it is an example I found on the internets that I applied here and it worked. Most other uses of map I've found are like the 2nd line, where it is called from the Series object.

So first question, which of these is better practice and what exactly is the first one doing?

2nd and final question. This is the more important of the two. Map, apply, applymap - not sure which to use here. The first commented out line of code does NOT work, while the second gives me exactly what I want.

def my_function(param1, param2, param3):
    return param1 * param2 * param3 # example

# Can't get this df.map function to work?
# Error map is not attribute of dataframe
# df['New_Col'] = df.map(my_function, df.param1, df.param1.shift(1), 
#    df.param2.shift(1))

# TypeError: my_function takes 3 positional args, but 4 were given
# df['New_Col'] = df.apply(my_function, args=(df.param1, df.param1.shift(1), 
#    df.param2.shift(1)))

# This works, not sure why
df['New_Col'] = list(map(my_function, df.param1, df.param1.shift(1), 
     df.param2.shift(1)))

I'm trying to compute a result that is based off of two columns of the df, from the current and previous rows. I've tried variations on map and apply when called from the df directly (df.map, df.apply) and haven't had success. But if I use the list(map(...)) notation it works great.

Is list(map(...)) acceptable? Which is best practice? Is there a correct way to use apply or map directly from the df object?

Thanks guys, appreciated.

EDIT: MaxU's response below works also. As it is, both of these work:

df['New_Col'] = list(map(my_function, df.param1, df.param1.shift(1), 
        df.param2.shift(1)))
df['New_Col'] = my_function(df.parma1, df.param1.shift(1), df.param2.shift(1))

# This does NOT work
df['New_Col'] = df.apply(my_function, axis=1, args=(df.param1, 
        df.param1.shift(1), df.param2.shift(1)))
# Also does not work
# AttributeError: ("'float' object has no attribute 'shift'", 
    'occurred at index 2000-01-04 00:00:00')
# Will work if I remove the shift(), but not what I need.
df['New_Col'] = df.apply(lambda x: my_function(x.param1, x.param1.shift(1),
    x.param2.shift(1)))

I'm still unclear as to the proper syntax to use apply here, and if any of these 3 methods are superior to the other (I'm guessing list(map(...)) is the "worst" of the 3 since it iterates and isn't vectorized.

list(map(function, iterable)) returns a newly allocated list whose contents are the results of application of a function to each of the elements of iterable. — ForceBru
– ForceBru, Commented Aug 18, 2017 at 15:16
MaxU's answer below gives a way to use my function without using map or apply at all. I haven't seen this use anywhere else in my search - and it works great. I would close this, but I still would like to know the proper syntax for using apply in this case ... and how it's different at all from just passing the function the individual series of the DF. Thanks guys. — RaceFrog
– RaceFrog, Commented Aug 21, 2017 at 18:02

MaxU - stand with Ukraine · Accepted Answer · 2017-08-18 15:26:33Z

8

So first question, which of these is better practice and what exactly is the first one doing?

df['Action'] = df['param1'].map(my_function)

is much more idiomatic, faster (vectorized) and more reliable.

2nd and final question. This is the more important of the two. Map, apply, applymap - not sure which to use here. The first commented out line of code does NOT work, while the second gives me exactly what I want.

Pandas does NOT have DataFrame.map() - only Series.map(), so if you need to access multiple columns in your mapping function - you can use DataFrame.apply().

Demo:

df['New_Col'] = df.apply(lamba x: my_function(x.param1,
                                              x.param1.shift(1),
                                              x.param2.shift(1),
                         axis=1)

or just:

df['New_Col'] = my_function(df.param1, df.param1.shift(1), df.param2.shift(1))

edited Aug 18, 2017 at 15:26

answered Aug 18, 2017 at 15:19

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

RaceFrog Over a year ago

Thanks for the answer. I tried apply above, but I must not be using it correctly because i get the TypeError.

MaxU - stand with Ukraine Over a year ago

@RaceFrog, very often there is a better solutions compared to .apply(). Could you post a sample data set, short description of what you are trying to do and your desired data set?

RaceFrog Over a year ago

I tried your lambda notation and get: ValueError: Wrong number of items passed 3, placement implies 1

RaceFrog Over a year ago

Can't get the lambda one to work, but the last line works perfectly. I didn't know you could avoid map and apply altogether! Really cool.

RaceFrog Over a year ago

The lambda function isn't working when I use the call to shift. AttributeError: ("'float' object has no attribute 'shift'", 'occurred at index 2000-01-04 00:00:00') .... If I remove the .shift(1) from the last 2 arguments, it compiles and works. But they are not the inputs I want to use.

Collectives™ on Stack Overflow

Correct use of map for mapping a function onto a df, python pandas

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related