2

I have a Python's (2.7) Pandas DF which has columns which looks something like this :

       email
['[email protected]']
['[email protected]']
['[email protected]']
['[email protected]']

I want to extract email from it without the square bracket and single quotes. Output should like this :

     email
[email protected]
[email protected]
[email protected]
[email protected]

I have tried the suggestions from this answer :Replace all occurrences of a string in a pandas dataframe (Python) . But its not working. Any help will be appreciated.

edit: What if I have array of more than 1 dimension. something like :

          email
  ['[email protected]']
  ['[email protected]']
  ['[email protected]']
  ['[email protected]','[email protected]']
  ['[email protected]','[email protected]', '[email protected]']

is it possible to get the output in three different columns without square brackets and single quotes.

0

2 Answers 2

4

You can use str.strip if type of values is string:

print type(df.at[0,'email'])
<type 'str'>

df['email'] = df.email.str.strip("[]'")
print df
              email
0    [email protected]
1  [email protected]
2    [email protected]
3   [email protected]

If type is list apply Series:

print type(df.at[0,'email'])
<type 'list'>

df['email'] = df.email.apply(pd.Series)
print df
              email
0    [email protected]
1  [email protected]
2    [email protected]
3   [email protected]

EDIT: If you have multiple values in array, you can use:

df1 = df['email'].apply(pd.Series).fillna('')
print df1
                  0                  1                 2
0    [email protected]                                     
1  [email protected]                                     
2    [email protected]                                     
3   [email protected]  [email protected]                  
4   [email protected]   [email protected]  [email protected]
Sign up to request clarification or add additional context in comments.

6 Comments

thank you for your quick response.. Type of my column is <type 'numpy.ndarray'> . I tried to convert it into list and then apply the series method but its giving a error "ValueError: could not broadcast input array from shape (2,6) into shape (6)".. Any suggestions would be greatly appreciated.
One question - How many columns have your dataframe with real data?
I have total 17 columns but only three are of array type.. the email column which is an array can have 1-D, 2-D or any dimensional array.. the column looks like : email ['[email protected]'] ['[email protected]'] ['[email protected]'] ['[email protected]'] ['[email protected]','[email protected]'] . So I also have multiple emails in the same column.
Can you update your question and add your desired output? If there are multiple emails, output is multiple columns?
yes sure.. I currently have 11 reps .. 4 more to go and then I'll accept your answer :)
|
0

Try this one:

from re import findall
s = "['[email protected]']"                     
m = findall(r"\[([A-Za-z0-9@'._]+)\]", s) 
print(m[0].replace("'",''))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.