How to efficiently expanding arrays in python?

Question

My question is how to efficiently expand an array, by copying itself many times. I am trying to expand my survey samples to the full size dataset, by copying every sample N times. N is the influence factor that signed to the sample. So I wrote two loops to do this task (script pasted below). It works, but is slow. My sample size is 20,000, and try to expand it into 3 million full size.. is there any function I can try? Thank you for your help!

----My script----

lines = np.asarray(person.read().split('\n'))
df_array = np.asarray(lines[0].split(' '))
for j in range(1,len(lines)-1):
    subarray = np.asarray(lines[j].split(' '))
    factor = int(round(float(subarray[-1]),0))
    for i in range(1,factor):
        df_array = np.vstack((df_array, subarray))
print len(df_array)

eph · Accepted Answer · 2015-12-19 01:16:46Z

2

First, you can try to load data all together with numpy.loadtxt.

Then, to repeat according to the last column, use numpy.repeat:

>>> data = np.array([[1, 2, 3],
...                  [4, 5, 6]])
>>> np.repeat(data, data[:,-1], axis=0)
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6]])

Finally, if you need to round data[:,-1], replace it with np.round(data[:,-1]).astype(int).

edited Dec 19, 2015 at 1:16

answered Dec 19, 2015 at 1:08

eph

2,04813 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

fivetentaylor · Accepted Answer · 2015-12-19 00:15:00Z

1

Stacking numpy arrays over and over is not very efficient, because they're not really optimized for dynamic growth like that. Every time you vstack, it's allocating a whole new chunk of memory for the size of your data at that point.

Use lists then build your array right at the end, maybe something with a generator like this:

def upsample(stream):
    for line in stream:
        rec = line.strip().split()
        factor = int(round(float(rec[-1]),0))
        for i in xrange(factor):
            yield rec

df_array = np.array(list(upsample(person)))

edited Dec 19, 2015 at 0:15

answered Dec 19, 2015 at 0:08

fivetentaylor

1,2978 silver badges12 bronze badges

Comments

timbo · Accepted Answer · 2015-12-19 00:30:20Z

1

The concept you are looking for is called broadcasting. It allows you to fill an n dimensional array with an n-1 dimensional array's contents.

Looking at your code example, you are calling np.vstack() in a loop. Broadcasting will eliminate the loop.

For example, if you have a 1D array of n elements,

>>> n = 5
>>> df_array = np.arange(n)
>>> df_array
array([0, 1, 2, 3, 4])

you can then create a new n x 10 array:

>>> bigger_array = np.empty([10,n])
>>> bigger_array[:] = df_array
>>> bigger_array
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.]])

So with a single line of code, you can fill it with the contents of the smaller array.

bigger_array[:] = df_array

NB. Avoid using python lists. They are far, far slower than the Numpy ndarray.

answered Dec 19, 2015 at 0:30

timbo

14.6k9 gold badges54 silver badges77 bronze badges

2 Comments

Angela Y Over a year ago

Thank you. If my understand is right, you are saying apply bigger_array[:] to expand the small sample. After expanding them one by one, I also need to combine all of them into a big data set. At that stage, it is not expanding, is combine.. is there any efficient way than using np.vstack()?

timbo Over a year ago

The most efficient way is likely to use 'np.empty()' to allocate the space/memory for your end dataset and then load data & broadcast within that using slice indexing. This will be inherently faster than using loops in Python.

Collectives™ on Stack Overflow

How to efficiently expanding arrays in python?

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related