Remove element from every list in a column in pandas DataFrame

Question

I have a pretty simple question, but I'm having trouble achieving what I want. I have a DataFrame that looks like this:

base
[a,b,c]
[c,d,e]
[a,b,h]

I want to remove the second element of every list, so I would get this:

base
[a,c]
[c,e]
[a,h]

I suppose there's an easy way to do this, but it's not that usual to work with lists in DataFrames, so I'm not finding anything.

Thanks in advance.

Edit: The DataFrame is just one column, which is comprised of lists, all of the same length. I need to remove one element, so the length of the list is the same as the number of columns of the DataFrame it will become.

jpp · Accepted Answer · 2018-10-02 15:20:52Z

7

Don't use `list` in series

Pandas series are not designed to hold lists. You lose all functionality and performance with 2 layers of pointers: one with your object dtype array, another corresponding to each list within your series.

Since each list has the same number of elements, separate into columns instead:

df = pd.DataFrame({'base': [list('abc'), list('cde'), list('abh')]})

res = pd.DataFrame(df['base'].values.tolist()).iloc[:, [0, 2]]

print(res)

   0  2
0  a  c
1  c  e
2  a  h

edited Oct 2, 2018 at 15:20

answered Oct 2, 2018 at 15:11

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Juan C Over a year ago

I want to split them into columns afterwards. The thing is, data came in a bad format and one useless column split into two, so I need to remove that element of the list, so I can then turn it into a DataFrame. Anyways, I think I can work with this code and achieve what I want. Thanks !

jpp Over a year ago

data came in a bad format. It may be worth trying to fix this upstream rather than expensively in Pandas. Sounds like an XY problem.

BENY · Accepted Answer · 2018-10-02 15:14:11Z

6

IIUC

df.base=pd.DataFrame(df.base.values.tolist()).drop(1,1).values.tolist()
df
Out[635]: 
     base
0  [a, c]
1  [c, e]
2  [a, h]

answered Oct 2, 2018 at 15:14

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

Juan C Over a year ago

You understood perfectly. I know it's kind of a weird question, but it's because my data was weird. Thanks a lot!

sacuL · Accepted Answer · 2018-10-02 15:17:18Z

1

You could work on the underlying np.array:

df['base'] = np.stack(df.base.values)[:,[0,2]].tolist()

>>> df
     base
0  [a, c]
1  [c, e]
2  [a, h]

answered Oct 2, 2018 at 15:17

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Comments

Acccumulation · Accepted Answer · 2018-10-02 15:52:32Z

You can use df['base'].apply(lambda x: x.pop(1)). Note that pop acts in place, so you don't need to assign the result to base (in fact, if you do so, you'll get the removed element instead of the remaining list).

However, as @jpp says, you should consider using some other data structure, such as a dataframe with multi-index or a three-dimensional numpy array.

And considering your edit, it's probably easier to convert the data to a dataframe with multiple columns, and then delete the extra column, rather than trying to manipulate a column of lists and then turn it into your final dataframe. It may seem simpler to have "only one column", but you're just putting the extra complexity into a separate layer, rather than getting rid of it. Pandas was built around two-dimensional data being represented as columns and rows, not a single column of lists, so you're going out of your way to not use the tools that pandas was built to provide.

Presumably, you had something like this:

data=[['a','b','c'],
['c','d','e'],
['a','b','h']]

And you did something like this:

df = pd.DataFrame({'base':data})

You should instead do

df = pd.DataFrame(data)
df = df[[0,2]]

Collectives™ on Stack Overflow

Remove element from every list in a column in pandas DataFrame

4 Answers 4

Don't use `list` in series

2 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Don't use list in series

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

Don't use `list` in series