Python Pandas Index error: List Index out of range

Question

My code worked on a previous dataset and now stopped working. I looked through other answers for this error message, but none seems applicable to mine.

I have one column in my dataframe df for Email_Address and I would like to just split the domain out into a new columns.

My dataframe is a subset of a previous df.

#create new df, for only email addresses I need to review
df = df_raw.loc[df_raw['Review'] == 'Y'].copy()

#I reset the index to fix the problem, but it didnt help
df = df.reset_index(drop=True)

#ensure Email Address is a string
df['Email_Address']= df.Email_Address.apply(str)

#make Email Address lower case
df['email_lowercase'] = df['Email_Address'].str.lower()

#Split out domain into a new column 
df['domain'] = df['email_lowercase'].apply(lambda x: x.split('@')[1])

IndexError: list index out of range

This might mean that the symbol @ doesn't exist in one of your cell so that you can't access the part of the email that is 'after' the @. Sometimes users type at instead of @ so they can't be traced by bots. Have you checked for that? — ysearka
– ysearka, Commented Aug 29, 2017 at 14:26
Im not sure but try changing this df['Email_Address']= df.Email_Address.apply(str) to this df['Email_Address']= df.Email_Address.astype(str) Its also possible you have non-clean data where there is no data on some rows after @ which would cause it to fail. Check that too. — Stefano Potter
– Stefano Potter, Commented Aug 29, 2017 at 14:26
without a representative df, it's impossible to reproduce your error. Please provide a MVCE — C8H10N4O2
– C8H10N4O2, Commented Aug 29, 2017 at 14:28

Jan Zeiseweis · Accepted Answer · 2017-08-29 14:40:51Z

3

You most likely have invalid emails in your dataframe. You can identify these by using

df[~df.Email_Address.astype(str).str.contains('@')]

You could use this approach to extract the domain

def extract_domain(email):
    email_domain = email.split('@')
    if len(email_domain) > 1:
        return email_domain[1]

df['domain'] = df['email_lowercase'].apply(extract_domain)

or even shorter:

df['domain'] = df['email_lowercase'].str.split('@').apply(lambda li: li[1] if len(li) > 1 else None)

edited Aug 29, 2017 at 14:40

answered Aug 29, 2017 at 14:31

Jan Zeiseweis

3,7482 gold badges19 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jeangelj Over a year ago

Thank you, I tried this and get AttributeError: 'Series' object has no attribute 'contains'

Jan Zeiseweis Over a year ago

@jeangelj I fixed this. (forgot str. before contains)

jeangelj Over a year ago

Thank you, it seems there are some Nan surprisingly - I made them into 0s

Collectives™ on Stack Overflow

Python Pandas Index error: List Index out of range

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related