How to add a column to numpy array starting with 0 till the length of array?

Question

I have a array to which I want to insert a column at the 0th position and fill the values starting with 0 till the length of the array.

import io
import numpy as np

data =io.StringIO("""
ID,1,2
5362,0.9,-0.4
485,-0.6,0.5
582,0.0,0.9
99,0.7,0.5
75,-0.4,0.5
474,0.3,0.8
594,-0.2,0.0
597,0.9,-0.3
124,0.7,0.6
635,0.8,0.9
""")
data = genfromtxt(data, delimiter=',', skip_header=1, dtype=np.float64)

Expected:

IDX,ID,1,2
0,5362,0.9,-0.4
1,485,-0.6,0.5
2,582,0.0,0.9
3,99,0.7,0.5
4,75,-0.4,0.5
5,474,0.3,0.8
6,594,-0.2,0.0
7,597,0.9,-0.3
8,124,0.7,0.6
9,635,0.8,0.9

consider converting it to a Pandas dataframe, then insert your column and finally convert back to numpy array using to_numpy (pandas.pydata.org/pandas-docs/stable/reference/api/…) — exan
– exan, Commented Nov 21, 2019 at 4:53
Actually I do not want to use pandas as I want to use this to run on gpu. So could this be done with numpy only? — axay
– axay, Commented Nov 21, 2019 at 5:15
Yes many ways to do that... have a look here stackoverflow.com/questions/8486294/… — exan
– exan, Commented Nov 21, 2019 at 5:22
You can easily concatenate on a np.arange(10)[:,None] array. But the result will be all floats. For fast numeric calculations, numpy arrays have to have the same dtype through out. There are ways of mixing dtypes, but that slows down the calculation. Do those first 2 columns have to be in the same array as the float columns? — hpaulj
– hpaulj, Commented Nov 21, 2019 at 5:59
@hpaulj - I am having repeating 1st column values. So in order to give unique values, I want to add a column so I will have access to the exact row number for furthe processing — axay
– axay, Commented Nov 21, 2019 at 6:13

hpaulj · Accepted Answer · 2019-11-21 06:45:50Z

In [110]: txt = """ 
     ...: ID,1,2 
     ...: 5362,0.9,-0.4 
     ...: 485,-0.6,0.5 
     ...: 582,0.0,0.9 
     ...: 99,0.7,0.5 
     ...: 75,-0.4,0.5 
     ...: 474,0.3,0.8 
     ...: 594,-0.2,0.0 
     ...: 597,0.9,-0.3 
     ...: 124,0.7,0.6 
     ...: 635,0.8,0.9 
     ...: """  

In [113]: data = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2)   
In [114]: data                                                                  
Out[114]: 
array([[ 5.362e+03,  9.000e-01, -4.000e-01],
       [ 4.850e+02, -6.000e-01,  5.000e-01],
       [ 5.820e+02,  0.000e+00,  9.000e-01],
       ...
       [ 6.350e+02,  8.000e-01,  9.000e-01]])


In [118]: data1 = np.concatenate([np.arange(data.shape[0])[:,None],data], axis=1)                                                                     
In [119]: data1                                                                 
Out[119]: 
array([[ 0.000e+00,  5.362e+03,  9.000e-01, -4.000e-01],
       [ 1.000e+00,  4.850e+02, -6.000e-01,  5.000e-01],
       [ 2.000e+00,  5.820e+02,  0.000e+00,  9.000e-01],
       [ 3.000e+00,  9.900e+01,  7.000e-01,  5.000e-01],
         ...
       [ 9.000e+00,  6.350e+02,  8.000e-01,  9.000e-01]])

creating 2 arrays, one of int id, the other float values

In [124]: ID = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2,usecols=[0],dtype=int)                                                     
In [126]: ID                                                                    
Out[126]: array([5362,  485,  582,   99,   75,  474,  594,  597,  124,  635])
In [127]: np.column_stack([np.arange(ID.shape[0]),ID])                          
Out[127]: 
array([[   0, 5362],
       [   1,  485],
       [   2,  582],
        ...
       [   9,  635]])
In [128]: data2 = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2,usecols=[1,2])                                                          
In [129]: data2                                                                 
Out[129]: 
array([[ 0.9, -0.4],
       [-0.6,  0.5],
       [ 0. ,  0.9],
        ...
       [ 0.8,  0.9]])

Or as a structured array:

In [120]: data2 = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=1,na
     ...: mes=True, dtype=None)                                                 
In [121]: data2                                                                 
Out[121]: 
array([(5362,  0.9, -0.4), ( 485, -0.6,  0.5), ( 582,  0. ,  0.9),
       (  99,  0.7,  0.5), (  75, -0.4,  0.5), ( 474,  0.3,  0.8),
       ( 594, -0.2,  0. ), ( 597,  0.9, -0.3), ( 124,  0.7,  0.6),
       ( 635,  0.8,  0.9)],
      dtype=[('ID', '<i8'), ('1', '<f8'), ('2', '<f8')])

I could add another id column, and consolidate the float columns, but that can wait.

Matt Hall · Accepted Answer · 2019-11-21 05:23:56Z

This is probably a job for pandas. NumPy is really intended for situations where the numbers in an array are all measurements of the same thing. And I'd also add that you might not really need these indices in NumPy, since you can already ask for the n-th row with NumPy's indexing. But you can have more or less what you want if you're prepared to compromise a bit:

data = data[1:]
idx = np.arange(data.shape[0]).reshape(-1, 1)
np.hstack([idx, data])

In the first line, I've sliced off the header, because NumPy arrays don't have column headings like this. That's a pandas thing.

In the second line I've made a 'column' of monotonically increasing indices. This is a bunch of ints for now, but not for long.

In the third line I've concatenated everything. Everything is floats now. You can't have one column of ints and 3 columns of floats... pandas again.

Collectives™ on Stack Overflow

How to add a column to numpy array starting with 0 till the length of array?

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related