python pandas - Convert a column of tuples to string column

Question

This should be a relatively simple question.

Below is the sample of my df column:

             title2
1      (, 2 ct, , )
2      (, 1 ct, , )
3      (, 2 ct, , )
4               NaN
5      (, 2 ct, , )
6     (, 5 ct, , )
7  (, 7 ounce, , )
8    (, 1 gal, , )
9              NaN
10             NaN

I would like to convert the whole column to a proper string column - i.e. my desired output would be:

    title2
1      2ct
2      1ct
3      2ct
4      NaN
5      2ct
6      5ct
7  7 ounce
8     1gal
9      NaN
10     NaN

I have tried the following commands, but none seem to work:

title['title3'] = title['title2'].agg(' '.join)
title['title3'] = title['title2'].apply(lambda x: ''.join(x))
title['title3'] = title['title2'].astype(str)
title['title3'] = title['title2'].values.astype(str)

The answer given in this post: Convert a pandas column containing tuples to string, also does not help me unfortunately.

Can some shed some light on this? Thank you all.

@shubhamSharma yours worked! I had a feeling this would be much simpler than I expected. — sophocles
– sophocles, Commented Dec 22, 2020 at 13:02
whats wrong with a simple regex ? df['title2'].replace('[(,\s+,)]','',regex=True) — Umar.H
– Umar.H, Commented Dec 22, 2020 at 13:08

Serge de Gosson de Varennes · Accepted Answer · 2020-12-22 13:06:55Z

1

This will do the trick

demo_data['title2'] = demo_data['title2'].astype(str).map(lambda x: x.lstrip("\,\'\[ \(").rstrip(" \, \,\'\]\)"))
demo_data['title2'] = demo_data['title2'].str.replace(r"\', \'", ",")
demo_data['title2']= demo_data['title2'].astype(str).map(lambda x: x.lstrip("\,\'\[ \(").rstrip(" \, \,\'\]\)"))
demo_data['title2'] = demo_data['title2'].str.replace(r" ", "")

which gives.

   ID  title2
0   1     2ct
1   2     1ct
2   3     2ct
3   4     nan
4   5     2ct
5   6     5ct
6   7  7ounce
7   8    1gal
8   9     nan
9  10     nan

answered Dec 22, 2020 at 13:06

Serge de Gosson de Varennes

11.6k4 gold badges30 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ExistingAsMike · Accepted Answer · 2020-12-22 13:07:55Z

1

Using regex:

import re

df['title3'] = df['title2'].apply(lambda x: re.sub('[^A-Za-z0-9]', '', str(x)))

answered Dec 22, 2020 at 13:07

ExistingAsMike

373 bronze badges

Comments

IoaTzimas · Accepted Answer · 2020-12-22 13:23:57Z

1

Try the following. I assume that tuples and Nans are saved as strings in your column, if not let me know so that i will adjust solution:

def clear(x):
    if x=='Nan':
        return 'Nan'
    else:
        l=str(x)
        l=[i.strip() for i in l.split(',')]
        return [i for i in l if any(k in ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9') for k in i)][0]

df['title2']=df['title2'].apply(lambda x: clear(x))

edited Dec 22, 2020 at 13:23

answered Dec 22, 2020 at 13:03

IoaTzimas

10.7k2 gold badges15 silver badges32 bronze badges

2 Comments

sophocles Over a year ago

Megale, unfortunately your code above gives me something like ("", ("",("".....

sophocles Over a year ago

Now it gives me the error list index out of range. I don't think it's your fault though, I think it's because I only posted a sample of 10 rows from a column of ~7k rows... So there are probably other stuff to be taken into consideration. Nonetheless, the simple code in the comments above gave me the answer I am looking for so don't worry about this. Thanks again.

Collectives™ on Stack Overflow

python pandas - Convert a column of tuples to string column

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related