1

This should be a relatively simple question.

Below is the sample of my df column:

             title2
1      (, 2 ct, , )
2      (, 1 ct, , )
3      (, 2 ct, , )
4               NaN
5      (, 2 ct, , )
6     (, 5 ct, , )
7  (, 7 ounce, , )
8    (, 1 gal, , )
9              NaN
10             NaN

I would like to convert the whole column to a proper string column - i.e. my desired output would be:

    title2
1      2ct
2      1ct
3      2ct
4      NaN
5      2ct
6      5ct
7  7 ounce
8     1gal
9      NaN
10     NaN

I have tried the following commands, but none seem to work:

title['title3'] = title['title2'].agg(' '.join)
title['title3'] = title['title2'].apply(lambda x: ''.join(x))
title['title3'] = title['title2'].astype(str)
title['title3'] = title['title2'].values.astype(str)

The answer given in this post: Convert a pandas column containing tuples to string, also does not help me unfortunately.

Can some shed some light on this? Thank you all.

5
  • 6
    df['title2'].str.join(' ').str.strip() ? Commented Dec 22, 2020 at 12:59
  • 1
    Are these 'tuples' saved as string in your column cells? Commented Dec 22, 2020 at 12:59
  • @shubhamSharma yours worked! I had a feeling this would be much simpler than I expected. Commented Dec 22, 2020 at 13:02
  • In any case, thank you both for assisting. Commented Dec 22, 2020 at 13:02
  • 1
    whats wrong with a simple regex ? df['title2'].replace('[(,\s+,)]','',regex=True) Commented Dec 22, 2020 at 13:08

3 Answers 3

1

This will do the trick

demo_data['title2'] = demo_data['title2'].astype(str).map(lambda x: x.lstrip("\,\'\[ \(").rstrip(" \, \,\'\]\)"))
demo_data['title2'] = demo_data['title2'].str.replace(r"\', \'", ",")
demo_data['title2']= demo_data['title2'].astype(str).map(lambda x: x.lstrip("\,\'\[ \(").rstrip(" \, \,\'\]\)"))
demo_data['title2'] = demo_data['title2'].str.replace(r" ", "")

which gives.

   ID  title2
0   1     2ct
1   2     1ct
2   3     2ct
3   4     nan
4   5     2ct
5   6     5ct
6   7  7ounce
7   8    1gal
8   9     nan
9  10     nan

Sign up to request clarification or add additional context in comments.

Comments

1

Using regex:

import re

df['title3'] = df['title2'].apply(lambda x: re.sub('[^A-Za-z0-9]', '', str(x)))

Comments

1

Try the following. I assume that tuples and Nans are saved as strings in your column, if not let me know so that i will adjust solution:

def clear(x):
    if x=='Nan':
        return 'Nan'
    else:
        l=str(x)
        l=[i.strip() for i in l.split(',')]
        return [i for i in l if any(k in ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9') for k in i)][0]

df['title2']=df['title2'].apply(lambda x: clear(x))

2 Comments

Megale, unfortunately your code above gives me something like ("", ("",("".....
Now it gives me the error list index out of range. I don't think it's your fault though, I think it's because I only posted a sample of 10 rows from a column of ~7k rows... So there are probably other stuff to be taken into consideration. Nonetheless, the simple code in the comments above gave me the answer I am looking for so don't worry about this. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.