1

I have a numpy array with a shape (47365L, 14L), I would like to convert it into make 2 column dataFrame where the first column is the id, which is the number of the column in my original array and the second one is the data which is the content of the column like in the image below.

enter image description here

I tried this code buut without success.

    S.shape
    >>(47365L, 14L)
    IC_Label = ['sub_'+str(subNum) +'_IC_'+str(i) for i in range(0,S.shape[1])]
    ICA_df=pd.DataFrame(data=S, columns= IC_Label)

output:

        sub_1_IC_0  sub_1_IC_1  sub_1_IC_2  sub_1_IC_3  sub_1_IC_4 ............sub_1_IC_13
    0   -0.002277   -0.003315   -0.001300   0.000283    0.000473
    1   -0.004442   0.002921    0.000517    -0.000033   0.000349
    2   -0.003370   0.006067    0.002504    -0.000359   0.001467
    3   0.000075    0.004708    0.000087    -0.000176   0.002561
    .
    .
    47364

My data looks like this:

    ('MyData', array([[ -2.27721244e-03,  -3.31485020e-03,  -1.30029316e-03, ...,
     -1.33952356e-03,   2.93513841e-03,   1.22486268e-03],
   [ -4.44161427e-03,   2.92134270e-03,   5.17394574e-04, ...,
     -1.42587472e-03,   4.74996003e-03,   2.86699268e-03],
   [ -3.36958390e-03,   6.06671900e-03,   2.50417650e-03, ...,
      3.80042801e-03,   4.77244983e-03,   2.82142774e-03],
   ..., 
   [  3.57952033e-04,  -3.55516707e-04,   1.32481754e-03, ...,
      7.55565993e-03,   1.52018736e-02,  -4.67047470e-03],
   [ -2.07206006e-03,   6.60686763e-04,   2.04637699e-03, ...,
      7.94521155e-03,   1.50024315e-02,  -4.88461803e-03],
   [  1.43106607e-03,  -1.77342544e-03,   4.57835186e-05, ...,
      5.55311541e-03,   1.61148688e-02,  -4.63583610e-03]]))

Is there any way to do it?

Thank you

2
  • Can you add data sample of your input? Commented Nov 9, 2017 at 9:36
  • @jezrael please check the edit Commented Nov 9, 2017 at 9:38

1 Answer 1

1

I think you need select second values of tuple by [1], DataFrame constructor and last add_prefix:

df = pd.DataFrame(a[1]).add_prefix('sub_1_IC_')
print (df)
   sub_1_IC_0  sub_1_IC_1  sub_1_IC_2  sub_1_IC_3  sub_1_IC_4  sub_1_IC_5
0   -0.002277   -0.003315   -0.001300   -0.001340    0.002935    0.001225
1   -0.004442    0.002921    0.000517   -0.001426    0.004750    0.002867
2   -0.003370    0.006067    0.002504    0.003800    0.004772    0.002821
3    0.000358   -0.000356    0.001325    0.007556    0.015202   -0.004670
4   -0.002072    0.000661    0.002046    0.007945    0.015002   -0.004885
5    0.001431   -0.001773    0.000046    0.005553    0.016115   -0.004636

And then use melt:

df = pd.DataFrame(a[1]).add_prefix('CL_').melt(var_name='id', value_name='data')
print (df.head(10))
     id      data
0  CL_0 -0.002277
1  CL_0 -0.004442
2  CL_0 -0.003370
3  CL_0  0.000358
4  CL_0 -0.002072
5  CL_0  0.001431
6  CL_1 -0.003315
7  CL_1  0.002921
8  CL_1  0.006067
9  CL_1 -0.000356

Another alternative with numpy.ravel for flattening with numpy.repeat:

df = pd.DataFrame({'data':a[1].T.ravel(), 
                   'id': np.repeat(np.arange(6), 6)}, columns=['id','data'])
df['id'] = 'CL' + df['id'].astype(str)
print (df.head(10))
     id      data
0  CL_0 -0.002277
1  CL_0 -0.004442
2  CL_0 -0.003370
3  CL_0  0.000358
4  CL_0 -0.002072
5  CL_0  0.001431
6  CL_1 -0.003315
7  CL_1  0.002921
8  CL_1  0.006067
9  CL_1 -0.000356
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, but I would like to have all value of a column stacked to each other and having the column's name as index

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.