2

My code worked on a previous dataset and now stopped working. I looked through other answers for this error message, but none seems applicable to mine.

I have one column in my dataframe df for Email_Address and I would like to just split the domain out into a new columns.

My dataframe is a subset of a previous df.

#create new df, for only email addresses I need to review
df = df_raw.loc[df_raw['Review'] == 'Y'].copy()

#I reset the index to fix the problem, but it didnt help
df = df.reset_index(drop=True)

#ensure Email Address is a string
df['Email_Address']= df.Email_Address.apply(str)

#make Email Address lower case
df['email_lowercase'] = df['Email_Address'].str.lower()

#Split out domain into a new column 
df['domain'] = df['email_lowercase'].apply(lambda x: x.split('@')[1])

IndexError: list index out of range
3
  • 2
    This might mean that the symbol @ doesn't exist in one of your cell so that you can't access the part of the email that is 'after' the @. Sometimes users type at instead of @ so they can't be traced by bots. Have you checked for that? Commented Aug 29, 2017 at 14:26
  • 1
    Im not sure but try changing this df['Email_Address']= df.Email_Address.apply(str) to this df['Email_Address']= df.Email_Address.astype(str) Its also possible you have non-clean data where there is no data on some rows after @ which would cause it to fail. Check that too. Commented Aug 29, 2017 at 14:26
  • 1
    without a representative df, it's impossible to reproduce your error. Please provide a MVCE Commented Aug 29, 2017 at 14:28

1 Answer 1

3

You most likely have invalid emails in your dataframe. You can identify these by using

df[~df.Email_Address.astype(str).str.contains('@')]

You could use this approach to extract the domain

def extract_domain(email):
    email_domain = email.split('@')
    if len(email_domain) > 1:
        return email_domain[1]

df['domain'] = df['email_lowercase'].apply(extract_domain)

or even shorter:

df['domain'] = df['email_lowercase'].str.split('@').apply(lambda li: li[1] if len(li) > 1 else None)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, I tried this and get AttributeError: 'Series' object has no attribute 'contains'
@jeangelj I fixed this. (forgot str. before contains)
Thank you, it seems there are some Nan surprisingly - I made them into 0s

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.