0

Hey guys i'm trying to filter a nested list to only include strings that include the word "yellow". My aim is to store each color in their own separate column into my dataframe

I tried labels.str.split('yellow') but it just tell me that 'list' object has no attribute 'str'

[['Example1 (purple)',   
  ' Example2 (blue)',
  ' Example3 (orange)',
  ' Example4 (yellow)',
  ' Example5 (red)',
  ' Example6 (pink)',
  ' Example7 (sky)'],
 ['Example8 (purple)',
  ' Example9 (blue)',
  ' Example10 (orange)',
  ' Example11 (sky)',
  ' Example12 (green)',
  ' Example13 (green)',
  ' Example14 (yellow)',
  ' Example15 (red)',
  ' Example16 (pink)',
  ' Example17 (pink)',
  ' Example18 (green)',
  ' Example19 (sky)']]
6
  • Do you want [[yellow], [yellower]] or [yellow, yellower]? Commented Apr 10, 2019 at 20:52
  • You don't have a DataFrame or any Pandas object. This is just a regular Python list. You can't use Pandas methods on lists. Commented Apr 10, 2019 at 20:55
  • I want each example to correspondent to the color. So [['Example4'], ['Example14']] Commented Apr 10, 2019 at 20:55
  • Did you mean to have a missing space in Example3? are there any without spaces between them? Commented Apr 10, 2019 at 20:57
  • This is just an object with some data I stored from my original dataframe Commented Apr 10, 2019 at 20:58

2 Answers 2

1

If you don't want to preserve the inner lists you can do it with a double list comprehension:

[item for inner in my_list for item in inner if 'yellow' in item]

yields:

[' Example4 (yellow)', ' Example14 (yellow)']

If you want to preserve the inner lists you can do it like this:

[ [item for item in inner if 'yellow' in item] for inner in my_list ]

yields:

[[' Example4 (yellow)'], [' Example14 (yellow)']]

Sign up to request clarification or add additional context in comments.

Comments

1

Import necessary packages and intialize data:

import pandas as pd
import re

my_list = [['Example1 (purple)',
  ' Example2 (blue)',
  ' Example3 (orange)',
  ' Example4 (yellow)',
  ' Example5 (red)',
  ' Example6 (pink)',
  ' Example7 (sky)'],
 ['Example8 (purple)',
  ' Example9 (blue)',
  ' Example10 (orange)',
  ' Example11 (sky)',
  ' Example12 (green)',
  ' Example13 (green)',
  ' Example14 (yellow)',
  ' Example15 (red)',
  ' Example16 (pink)',
  ' Example17 (pink)',
  ' Example18 (green)',
  ' Example19 (sky)']]

Flatten the list, so its not nested lists in lists. (This is why you got an error that lists don't have split. If you do [x.split() for x in my_list] it will give an error because the elements made up of my_list are lists)

Define a flatlist function and flatten list:

flat_list = lambda l: [item for sublist in l for item in sublist]
flat = flat_list(my_list)

Create an empty dataframe

df = pd.DataFrame({})

Extract out the elements of the single flat list. this strips it of whitespace, then splits it by the space, taking the 0th element for the "Example1", and then strips it again to remove whitespace. do it again but take the 1st element for the color. Wrap it in () and separate by a comma to return it as a tuple.

splitout = [(x.strip().split(' ')[0].strip(), x.strip().split(' ')[1]) for x in pd.Series(flat)]

set the two dataframe columns. the first is just grabbing the first element of the splitout which is always Example, the second uses re.sub to remove the () from the color

df['Example'] = [x[0] for x in splitout]
df['Color'] = [re.sub('[/(/)]', '', x[1]) for x in splitout]

      Example   Color
0    Example1  purple
1    Example2    blue
2    Example3  orange
3    Example4  yellow
4    Example5     red
5    Example6    pink
6    Example7     sky
7    Example8  purple
8    Example9    blue
9   Example10  orange
10  Example11     sky
11  Example12   green
12  Example13   green
13  Example14  yellow
14  Example15     red
15  Example16    pink
16  Example17    pink
17  Example18   green
18  Example19     sky

Then you can pivot into a larger dataframe with colors for columns:

pd.pivot_table(df.assign(v=1), index='Example', columns='Color', values='v')

Color      blue  green  orange  pink  purple  red  sky  yellow
Example                                                       
Example1    NaN    NaN     NaN   NaN     1.0  NaN  NaN     NaN
Example10   NaN    NaN     1.0   NaN     NaN  NaN  NaN     NaN
Example11   NaN    NaN     NaN   NaN     NaN  NaN  1.0     NaN
Example12   NaN    1.0     NaN   NaN     NaN  NaN  NaN     NaN
Example13   NaN    1.0     NaN   NaN     NaN  NaN  NaN     NaN
Example14   NaN    NaN     NaN   NaN     NaN  NaN  NaN     1.0
Example15   NaN    NaN     NaN   NaN     NaN  1.0  NaN     NaN
Example16   NaN    NaN     NaN   1.0     NaN  NaN  NaN     NaN
Example17   NaN    NaN     NaN   1.0     NaN  NaN  NaN     NaN
Example18   NaN    1.0     NaN   NaN     NaN  NaN  NaN     NaN
Example19   NaN    NaN     NaN   NaN     NaN  NaN  1.0     NaN
Example2    1.0    NaN     NaN   NaN     NaN  NaN  NaN     NaN
Example3    NaN    NaN     1.0   NaN     NaN  NaN  NaN     NaN
Example4    NaN    NaN     NaN   NaN     NaN  NaN  NaN     1.0
Example5    NaN    NaN     NaN   NaN     NaN  1.0  NaN     NaN
Example6    NaN    NaN     NaN   1.0     NaN  NaN  NaN     NaN
Example7    NaN    NaN     NaN   NaN     NaN  NaN  1.0     NaN
Example8    NaN    NaN     NaN   NaN     1.0  NaN  NaN     NaN
Example9    1.0    NaN     NaN   NaN     NaN  NaN  NaN     NaN

whole code:

import pandas as pd
import re

flat_list = lambda l: [item for sublist in l for item in sublist]
flat = flat_list(my_list)

splitout = [(x.strip().split(' ')[0].strip(), x.strip().split(' ')[1]) for x in pd.Series(flat)]

df = pd.DataFrame({})
df['Example'] = [x[0] for x in splitout]
df['Color'] = [re.sub('[/(/)]', '', x[1]) for x in splitout]

pivot = pd.pivot_table(df.assign(v=1), index='Example', columns='Color', values='v')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.