Applying lambda function to pandas dataframe - returns index but not values?

Question

I'm running a process to clean up some telephone numbers (UK) and have decided to run a lambda function across a Pandas DataFrame using regex/substitution to remove characters that I do not want to include (non-numeric, allowing a +)

Code is as follows: (phone_test is just a DataFrame of test examples, two columns, an index and the values)

def clean_phone_number(tel_no):
    for row in test_data:
        row = re.sub('[^?0-9+]+', '', row)
        return(row)

phone_test_result = phone_test['TEL_NUMBER'].apply(lambda x: clean_phone_number(x))

The problem that I've got is that is that the outcome (phone_test_result) just returns the index of the phone_test dataframe and not the newly formatted telephone number. I've been wracking my brain for a couple of hours but I'm sure its a simple problem.

At first I thought it was just the positioning of the return line (it should be under the for, right?) but when I do that I just get an output of a single phone number, repeated for the length of the loop (that isnt even in the phone_test dataframe!)

PLS HALP SO. thank you.

after the responses, this is what I've ended up with:

clean the phone number using regex and only take the first 13 characters
- substituting a leading zero with +44
- deleting everything with a length of less than 13 characters.
It's not perfect;
- there are some phone numbers with legit less digits
- means i trim out all of the extension numbers

def clean_phone_number(tel_no):
    clean_tel = re.sub('[^?0-9+]+', '', tel_no)[:13]
    if clean_tel[:1] == '0':
        clean_tel = '+44'+clean_tel[1:]
        if len(clean_tel) < 13:
            clean_tel = ''
    return(clean_tel)

jpp · Accepted Answer · 2019-01-17 22:45:36Z

3

pd.Series.apply applies a function to each value in a series. Notice lambda is unnecessary.

import re

phone_test = pd.DataFrame({'TEL_NUMBER': ['+44-020841396', '+44-07721-051-851']})

def clean_phone_number(tel_no):
     return re.sub('[^?0-9+]+', '', tel_no)

phone_test_result = phone_test['TEL_NUMBER'].apply(clean_phone_number)

# 0      +44020841396
# 1    +4407721051851
# Name: TEL_NUMBER, dtype: object

pd.DataFrame.apply, in contrast, applies a function to each row in a dataframe:

def clean_phone_number(row):
     return re.sub('[^?0-9+]+', '', row['TEL_NUMBER'])

phone_test_result = phone_test.apply(clean_phone_number, axis=1)

# 0      +44020841396
# 1    +4407721051851
# Name: TEL_NUMBER, dtype: object

answered Jan 17, 2019 at 22:45

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

abcdaire · Accepted Answer · 2019-01-17 22:44:03Z

2

You don't have to loop , the function will be executed for each element

def clean_phone_number(tel_no):
    return re.sub('[^?0-9+]+', '', tel_no)

or directly

phone_test_result = phone_test['TEL_NUMBER'].apply(lambda x: re.sub('[^?0-9+]+', '', x))

answered Jan 17, 2019 at 22:44

abcdaire

1,6281 gold badge14 silver badges23 bronze badges

3 Comments

jpp Over a year ago

You don't have to loop. To clarify, apply is just a thinly veiled loop.

abcdaire Over a year ago

Yes , It's in the sense " don't need to write your own for loop", but it's true it's better to have your clarification here if the OP is unaware of that :)

roastbeeef Over a year ago

thank you very much for this clarification - I've been writing python full time for a month now and have been including loops in functions that i intend to use with apply... continually!!

Collectives™ on Stack Overflow

Applying lambda function to pandas dataframe - returns index but not values?

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related