1

Just looking forward a solution to remove empty values from a column which has values as a list in a sense where we are already replacing some strings beforehand, where it's a column of string representation of lists.

In df.color we are Just replacing *._Blue with empty string:

Example DataFrame:

df = pd.DataFrame({ 'Bird': ["parrot", "Eagle", "Seagull"], 'color': [ "['Light_Blue','Green','Dark_Blue']", "['Sky_Blue','Black','White', 'Yellow','Gray']", "['White','Jet_Blue','Pink', 'Tan','Brown', 'Purple']"] })

>>> df
      Bird                                              color
0   parrot                 ['Light_Blue','Green','Dark_Blue']
1    Eagle      ['Sky_Blue','Black','White', 'Yellow','Gray']
2  Seagull  ['White','Jet_Blue','Pink', 'Tan','Brown', 'Pu...

Result of above DF:

>>> df['color'].str.replace(r'\w+_Blue\b', '')
0                                 ['','Green','']
1           ['','Black','White', 'Yellow','Gray']
2    ['White','','Pink', 'Tan','Brown', 'Purple']
Name: color, dtype: object

Usually in python it easily been done as follows..

>>> lst = ['','Green','']
>>> [x for x in lst if x]
['Green']

I'm afraid if something like below can be done.

df.color.mask(df == ' ')
4
  • For dataframes that contain lists or other hard to paste objects, you should use to_dict to create a minimal reproducible example, so that it is easy to re-create. Commented Aug 15, 2019 at 15:30
  • @user3483203, sorry for that .. Just updated the info on the post , hope that will helpful. Commented Aug 15, 2019 at 15:34
  • 1
    So your column isn't a column of lists, it's a column of string representation of lists? Commented Aug 15, 2019 at 15:34
  • That's true @user3483203 added the same in the post. Commented Aug 15, 2019 at 15:36

3 Answers 3

3

You can using the explode(pandas 0.25.0) then concat the list back

 df['color'].str.replace(r'\w+_Blue\b', '').explode().loc[lambda x : x!=''].groupby(level=0).apply(list)
Sign up to request clarification or add additional context in comments.

5 Comments

thnx @Wen but np.nan doesn't work in version '0.21.0' , looking for generic solution which may work with almost all versions
@pygo check the updat e
It comes with error AttributeError: 'Series' object has no attribute 'explode'
@pygo explode is new in pandas 0.25.0, please update your pandas
:-) hmm , okay in that sense we need to have some other way around.. tnx
2

You don't have a column of lists, you have a column that contains string representation of lists. You can do this all in a single step using ast.literal_eval and str.endswith. I would use a list-comprehension here which should be faster than apply


import ast

fixed = [
    [el for el in lst if not el.endswith("Blue")]
    for lst in df['color'].apply(ast.literal_eval)
]

df.assign(color=fixed)

      Bird                              color
0   parrot                            [Green]
1    Eagle       [Black, White, Yellow, Gray]
2  Seagull  [White, Pink, Tan, Brown, Purple]

1 Comment

Thnx mile @user3483203 .
1

Another way using filter and apply:

(df['color'].str.replace(r'\w+_Blue\b', '')
     .apply(lambda x: list(filter(bool, ast.literal_eval(x)))))

0                              [Green]
1         [Black, White, Yellow, Gray]
2    [White, Pink, Tan, Brown, Purple]

1 Comment

thnx @anky_91 :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.