0

is it possible, in a fast way, to create a (large) 2d numpy array which

  1. contains a value n times per row (randomly placed). e.g., for n = 3

    1 0 1 0 1
    0 0 1 1 1
    1 1 1 0 0
    ...
    
  2. same as 1., but place groups of that size n randomly per row. e.g.

    1 1 1 0 0
    0 0 1 1 1
    1 1 1 0 0
    ...
    

of course, I could enumerate all rows, but I am wondering if there's a way to create the array using np.fromfunctionor some faster way?

3
  • 1
    Do you want a specific probability distribution for a row to have 1, 2 or 3 ones? Commented Feb 7, 2014 at 20:24
  • 1
    This question appears to be off-topic because it is shows no attempt to solve the problem. Commented Feb 8, 2014 at 5:03
  • @EOL: within a row, there is no requirement for a probability distribution. Commented Feb 8, 2014 at 9:15

3 Answers 3

1

The answer to your first question has a simple one-line solution, which I imagine is pretty efficient. Functions like np.random.shuffle or np.random.permutation must be doing something similar under the hood, but they require a python loop over the rows, which might become a problem if you have very many short rows.

The second question also has a pure numpy solution which should be quite efficient, although it is a little less elegant.

import numpy as np

rows = 20
cols = 10
n = 3

#fixed number of ones per row in random places
print (np.argsort(np.random.rand(rows, cols)) < n).view(np.uint8)

#fixed number of ones per row in random contiguous place
data = np.zeros((rows, cols), np.uint8)
I = np.arange(rows*n)/n
J = (np.random.randint(0,cols-n+1, (rows,1))+np.arange(n)).flatten()
data[I, J] = 1
print data

Edit: here is a slightly longer, but more elegant and more performant solution to your second question:

import numpy as np

rows = 20
cols = 10
n = 3

def running_view(arr, window, axis=-1):
    """
    return a running view of length 'window' over 'axis'
    the returned array has an extra last dimension, which spans the window
    """
    shape = list(arr.shape)
    shape[axis] -= (window-1)
    assert(shape[axis]>0)
    return np.lib.index_tricks.as_strided(
        arr,
        shape + [window],
        arr.strides + (arr.strides[axis],))


#fixed number of ones per row in random contiguous place
data = np.zeros((rows, cols), np.uint8)

I = np.arange(rows)
J = np.random.randint(0,cols-n+1, rows)

running_view(data, n)[I,J,:] = 1
print data
Sign up to request clarification or add additional context in comments.

2 Comments

For the fixed number of ones per rows solution, I also get [1, 2, 1, 0, 0] - could the trick be then to use >0 masking?
Good point; the only reason I used division is to obtain the desired int result in a single pass over the array; but indeed that solution only works if n < cols/2. We can get the same result in a single pass with a comparison and a view; ill edit the code.
0

First of all you need to import some functions of numpy:

from numpy.random import rand, randint
from numpy import array, argsort

Case 1:

a = rand(10,5)
b=[]
for i in range(len(a)):
    n=3 #number of 1's
    b.append((argsort(a[i])>=(len(a[i])-n))*1)
b=array(b)

Result:

print b
array([[ 1,  0,  0,  1,  1],
       [ 1,  0,  0,  1,  1],
       [ 0,  1,  0,  1,  1],
       [ 1,  0,  1,  0,  1],
       [ 1,  0,  0,  1,  1],
       [ 1,  1,  0,  0,  1],
       [ 0,  1,  1,  1,  0],
       [ 0,  1,  1,  0,  1],
       [ 1,  0,  1,  0,  1],
       [ 0,  1,  1,  1,  0]])

Case 2:

a = rand(10,5)
b=[]
for i in range(len(a)):
    n=3 #max number of 1's
    n=randint(0,(n+1)) 
    b.append((argsort(a[i])>=(len(a[i])-n))*1)
b=array(b)

Result:

print b
array([[ 0,  0,  1,  0,  0],
       [ 0,  1,  0,  1,  0],
       [ 1,  0,  1,  0,  1],
       [ 0,  1,  1,  0,  0],
       [ 1,  0,  1,  0,  0],
       [ 1,  0,  0,  1,  1],
       [ 0,  1,  1,  0,  1],
       [ 1,  0,  1,  0,  0],
       [ 1,  1,  0,  1,  0],
       [ 1,  0,  1,  1,  0]])

I think that could work. To get the result i generate lists of random floats and with "argsort" see what of those are the n biggests of the list, then i filter them as ints (boolean*1-> int).

Comments

0

Just for the fun of it, I tried to find a solution for your first question even if I'm quite new to Python. Here what I have so far :

np.vstack([np.hstack(np.random.permutation([np.random.randint(0,2),
 np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
   np.hstack(np.random.permutation([np.random.randint(0,2),
 np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
   np.hstack(np.random.permutation([np.random.randint(0,2),
 np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
   np.hstack(np.random.permutation([np.random.randint(0,2),
 np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
   np.hstack(np.random.permutation([np.random.randint(0,2),
 np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
   np.hstack(np.random.permutation([np.random.randint(0,2),
 np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0]))])
array([[1, 0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 0],
       [0, 1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 1]])

It is not the final answer, but maybe it can help you find an alternate solution using random numbers and permutation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.