1

I have a pretty simple question, but I'm having trouble achieving what I want. I have a DataFrame that looks like this:

base
[a,b,c]
[c,d,e]
[a,b,h]

I want to remove the second element of every list, so I would get this:

base
[a,c]
[c,e]
[a,h]

I suppose there's an easy way to do this, but it's not that usual to work with lists in DataFrames, so I'm not finding anything.

Thanks in advance.

Edit: The DataFrame is just one column, which is comprised of lists, all of the same length. I need to remove one element, so the length of the list is the same as the number of columns of the DataFrame it will become.

0

4 Answers 4

7

Don't use list in series

Pandas series are not designed to hold lists. You lose all functionality and performance with 2 layers of pointers: one with your object dtype array, another corresponding to each list within your series.

Since each list has the same number of elements, separate into columns instead:

df = pd.DataFrame({'base': [list('abc'), list('cde'), list('abh')]})

res = pd.DataFrame(df['base'].values.tolist()).iloc[:, [0, 2]]

print(res)

   0  2
0  a  c
1  c  e
2  a  h
Sign up to request clarification or add additional context in comments.

2 Comments

I want to split them into columns afterwards. The thing is, data came in a bad format and one useless column split into two, so I need to remove that element of the list, so I can then turn it into a DataFrame. Anyways, I think I can work with this code and achieve what I want. Thanks !
data came in a bad format. It may be worth trying to fix this upstream rather than expensively in Pandas. Sounds like an XY problem.
6

IIUC

df.base=pd.DataFrame(df.base.values.tolist()).drop(1,1).values.tolist()
df
Out[635]: 
     base
0  [a, c]
1  [c, e]
2  [a, h]

1 Comment

You understood perfectly. I know it's kind of a weird question, but it's because my data was weird. Thanks a lot!
1

You could work on the underlying np.array:

df['base'] = np.stack(df.base.values)[:,[0,2]].tolist()

>>> df
     base
0  [a, c]
1  [c, e]
2  [a, h]

Comments

0

You can use df['base'].apply(lambda x: x.pop(1)). Note that pop acts in place, so you don't need to assign the result to base (in fact, if you do so, you'll get the removed element instead of the remaining list).

However, as @jpp says, you should consider using some other data structure, such as a dataframe with multi-index or a three-dimensional numpy array.

And considering your edit, it's probably easier to convert the data to a dataframe with multiple columns, and then delete the extra column, rather than trying to manipulate a column of lists and then turn it into your final dataframe. It may seem simpler to have "only one column", but you're just putting the extra complexity into a separate layer, rather than getting rid of it. Pandas was built around two-dimensional data being represented as columns and rows, not a single column of lists, so you're going out of your way to not use the tools that pandas was built to provide.

Presumably, you had something like this:

data=[['a','b','c'],
['c','d','e'],
['a','b','h']]

And you did something like this:

df = pd.DataFrame({'base':data})

You should instead do

df = pd.DataFrame(data)
df = df[[0,2]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.