How to extract content from the regex output which has square bracket in python

Question

I have a Python's (2.7) Pandas DF which has columns which looks something like this :

       email
['[email protected]']
['[email protected]']
['[email protected]']
['[email protected]']

I want to extract email from it without the square bracket and single quotes. Output should like this :

     email
[email protected]
[email protected]
[email protected]
[email protected]

I have tried the suggestions from this answer :Replace all occurrences of a string in a pandas dataframe (Python) . But its not working. Any help will be appreciated.

edit: What if I have array of more than 1 dimension. something like :

          email
  ['[email protected]']
  ['[email protected]']
  ['[email protected]']
  ['[email protected]','[email protected]']
  ['[email protected]','[email protected]', '[email protected]']

is it possible to get the output in three different columns without square brackets and single quotes.

jezrael · Accepted Answer · 2016-05-08 12:05:42Z

4

You can use str.strip if type of values is string:

print type(df.at[0,'email'])
<type 'str'>

df['email'] = df.email.str.strip("[]'")
print df
              email
0    [email protected]
1  [email protected]
2    [email protected]
3   [email protected]

If type is list apply Series:

print type(df.at[0,'email'])
<type 'list'>

df['email'] = df.email.apply(pd.Series)
print df
              email
0    [email protected]
1  [email protected]
2    [email protected]
3   [email protected]

EDIT: If you have multiple values in array, you can use:

df1 = df['email'].apply(pd.Series).fillna('')
print df1
                  0                  1                 2
0    [email protected]                                     
1  [email protected]                                     
2    [email protected]                                     
3   [email protected]  [email protected]                  
4   [email protected]   [email protected]  [email protected]

edited May 8, 2016 at 12:05

answered May 8, 2016 at 7:33

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user4349490 Over a year ago

thank you for your quick response.. Type of my column is <type 'numpy.ndarray'> . I tried to convert it into list and then apply the series method but its giving a error "ValueError: could not broadcast input array from shape (2,6) into shape (6)".. Any suggestions would be greatly appreciated.

jezrael Over a year ago

One question - How many columns have your dataframe with real data?

user4349490 Over a year ago

I have total 17 columns but only three are of array type.. the email column which is an array can have 1-D, 2-D or any dimensional array.. the column looks like : email ['[email protected]'] ['[email protected]'] ['[email protected]'] ['[email protected]'] ['[email protected]','[email protected]'] . So I also have multiple emails in the same column.

jezrael Over a year ago

Can you update your question and add your desired output? If there are multiple emails, output is multiple columns?

user4349490 Over a year ago

yes sure.. I currently have 11 reps .. 4 more to go and then I'll accept your answer :)

|

user1279432 · Accepted Answer · 2016-05-08 08:08:05Z

0

Try this one:

from re import findall
s = "['[email protected]']"                     
m = findall(r"\[([A-Za-z0-9@'._]+)\]", s) 
print(m[0].replace("'",''))

answered May 8, 2016 at 8:08

user1279432

Collectives™ on Stack Overflow

How to extract content from the regex output which has square bracket in python

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related