5

What I am trying to do is select the 1st element of each cell regardless of the number of columns or rows (they may change based on user defined criteria) and make a new pandas dataframe from the data. My actual data structure is similar to what I have listed below.

       0       1       2
0   [1, 2]  [2, 3]  [3, 6]
1   [4, 2]  [1, 4]  [4, 6]
2   [1, 2]  [2, 3]  [3, 6]
3   [4, 2]  [1, 4]  [4, 6]

I want the new dataframe to look like:

    0   1   2
0   1   2   3
1   4   1   4
2   1   2   3
3   4   1   4

The code below generates a data set similar to mine and attempts to do what I want to do in my code without success (d), and mimics what I have seen in a similar question with success(c ; however, only one column). The link to the similar, but different question is here :Python Pandas: selecting element in array column

import pandas as pd

zz = pd.DataFrame([[[1,2],[2,3],[3,6]],[[4,2],[1,4],[4,6]],
               [[1,2],[2,3],[3,6]],[[4,2],[1,4],[4,6]]])
print(zz)

x= zz.dtypes
print(x)

a = pd.DataFrame((zz.columns.values))
b = pd.DataFrame.transpose(a) 
c =zz[0].str[0] # this will give the 1st value for each cell in columns 0
d= zz[[b[0]].values].str[0] #attempt to get 1st value for each cell in all columns

3 Answers 3

12

You can use apply and for selecting first value of list use indexing with str:

print (zz.apply(lambda x: x.str[0]))
   0  1  2
0  1  2  3
1  4  1  4
2  1  2  3
3  4  1  4

Another solution with stack and unstack:

print (zz.stack().str[0].unstack())
   0  1  2
0  1  2  3
1  4  1  4
2  1  2  3
3  4  1  4
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks!!! I'm going to try both your methods and see what runs fastest.
5

I would use applymap which applies the same function to each individual cell in your DataFrame

df.applymap(lambda x: x[0])

   0  1  2
0  1  2  3
1  4  1  4
2  1  2  3
3  4  1  4

Comments

3

I'm a big fan of stack + unstack
However, @jezrael already put that answer down... so + 1 from me.

That said, here is a quicker way. By slicing a numpy array

pd.DataFrame(
    np.array(zz.values.tolist())[:, :, 0],
    zz.index, zz.columns
)

   0  1  2
0  1  2  3
1  4  1  4
2  1  2  3
3  4  1  4

timing

enter image description here

1 Comment

Thanks for the timing info and new method..+1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.