0

I have a array to which I want to insert a column at the 0th position and fill the values starting with 0 till the length of the array.

import io
import numpy as np

data =io.StringIO("""
ID,1,2
5362,0.9,-0.4
485,-0.6,0.5
582,0.0,0.9
99,0.7,0.5
75,-0.4,0.5
474,0.3,0.8
594,-0.2,0.0
597,0.9,-0.3
124,0.7,0.6
635,0.8,0.9
""")
data = genfromtxt(data, delimiter=',', skip_header=1, dtype=np.float64)

Expected:

IDX,ID,1,2
0,5362,0.9,-0.4
1,485,-0.6,0.5
2,582,0.0,0.9
3,99,0.7,0.5
4,75,-0.4,0.5
5,474,0.3,0.8
6,594,-0.2,0.0
7,597,0.9,-0.3
8,124,0.7,0.6
9,635,0.8,0.9
6
  • 1
    consider converting it to a Pandas dataframe, then insert your column and finally convert back to numpy array using to_numpy (pandas.pydata.org/pandas-docs/stable/reference/api/…) Commented Nov 21, 2019 at 4:53
  • Actually I do not want to use pandas as I want to use this to run on gpu. So could this be done with numpy only? Commented Nov 21, 2019 at 5:15
  • 1
    Yes many ways to do that... have a look here stackoverflow.com/questions/8486294/… Commented Nov 21, 2019 at 5:22
  • 1
    You can easily concatenate on a np.arange(10)[:,None] array. But the result will be all floats. For fast numeric calculations, numpy arrays have to have the same dtype through out. There are ways of mixing dtypes, but that slows down the calculation. Do those first 2 columns have to be in the same array as the float columns? Commented Nov 21, 2019 at 5:59
  • @hpaulj - I am having repeating 1st column values. So in order to give unique values, I want to add a column so I will have access to the exact row number for furthe processing Commented Nov 21, 2019 at 6:13

2 Answers 2

1
In [110]: txt = """ 
     ...: ID,1,2 
     ...: 5362,0.9,-0.4 
     ...: 485,-0.6,0.5 
     ...: 582,0.0,0.9 
     ...: 99,0.7,0.5 
     ...: 75,-0.4,0.5 
     ...: 474,0.3,0.8 
     ...: 594,-0.2,0.0 
     ...: 597,0.9,-0.3 
     ...: 124,0.7,0.6 
     ...: 635,0.8,0.9 
     ...: """  

In [113]: data = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2)   
In [114]: data                                                                  
Out[114]: 
array([[ 5.362e+03,  9.000e-01, -4.000e-01],
       [ 4.850e+02, -6.000e-01,  5.000e-01],
       [ 5.820e+02,  0.000e+00,  9.000e-01],
       ...
       [ 6.350e+02,  8.000e-01,  9.000e-01]])


In [118]: data1 = np.concatenate([np.arange(data.shape[0])[:,None],data], axis=1)                                                                     
In [119]: data1                                                                 
Out[119]: 
array([[ 0.000e+00,  5.362e+03,  9.000e-01, -4.000e-01],
       [ 1.000e+00,  4.850e+02, -6.000e-01,  5.000e-01],
       [ 2.000e+00,  5.820e+02,  0.000e+00,  9.000e-01],
       [ 3.000e+00,  9.900e+01,  7.000e-01,  5.000e-01],
         ...
       [ 9.000e+00,  6.350e+02,  8.000e-01,  9.000e-01]])

creating 2 arrays, one of int id, the other float values

In [124]: ID = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2,usecols=[0],dtype=int)                                                     
In [126]: ID                                                                    
Out[126]: array([5362,  485,  582,   99,   75,  474,  594,  597,  124,  635])
In [127]: np.column_stack([np.arange(ID.shape[0]),ID])                          
Out[127]: 
array([[   0, 5362],
       [   1,  485],
       [   2,  582],
        ...
       [   9,  635]])
In [128]: data2 = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2,usecols=[1,2])                                                          
In [129]: data2                                                                 
Out[129]: 
array([[ 0.9, -0.4],
       [-0.6,  0.5],
       [ 0. ,  0.9],
        ...
       [ 0.8,  0.9]])

Or as a structured array:

In [120]: data2 = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=1,na
     ...: mes=True, dtype=None)                                                 
In [121]: data2                                                                 
Out[121]: 
array([(5362,  0.9, -0.4), ( 485, -0.6,  0.5), ( 582,  0. ,  0.9),
       (  99,  0.7,  0.5), (  75, -0.4,  0.5), ( 474,  0.3,  0.8),
       ( 594, -0.2,  0. ), ( 597,  0.9, -0.3), ( 124,  0.7,  0.6),
       ( 635,  0.8,  0.9)],
      dtype=[('ID', '<i8'), ('1', '<f8'), ('2', '<f8')])

I could add another id column, and consolidate the float columns, but that can wait.

Sign up to request clarification or add additional context in comments.

1 Comment

The output is exactly as desired.
1

This is probably a job for pandas. NumPy is really intended for situations where the numbers in an array are all measurements of the same thing. And I'd also add that you might not really need these indices in NumPy, since you can already ask for the n-th row with NumPy's indexing. But you can have more or less what you want if you're prepared to compromise a bit:

data = data[1:]
idx = np.arange(data.shape[0]).reshape(-1, 1)
np.hstack([idx, data])

In the first line, I've sliced off the header, because NumPy arrays don't have column headings like this. That's a pandas thing.

In the second line I've made a 'column' of monotonically increasing indices. This is a bunch of ints for now, but not for long.

In the third line I've concatenated everything. Everything is floats now. You can't have one column of ints and 3 columns of floats... pandas again.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.