8

I wish to generate 10,000 random binary matrices which have the same number of 1s per row and per column as a given binary matrix.

The matrix is ~500 x ~10,000. There are about 2,000,000 1s. There are no zero rows or columns.

My current method converts the binary matrix into a bipartite adjacency matrix, and performs 1,000,000 random edge switches to guarantee randomness. This takes 13,000 seconds for 1 matrix. I'm coding in python, using a modified version of networkx's double_edge_swap function.

Is there a more efficient way to generate such matrices?

3
  • 2
    I was looking for the name of this problem. It is the main problem of discrete tomography "which deals with the reconstruction of a binary image from its horizontal and vertical linesums" and for the case of 2 dimensions (pairwise nonparallel lattice directions), the problem is in P. It would be interesting to know what needs 10,000 randomly chosen possible reconstructions. Commented Sep 1, 2015 at 6:57
  • You should specify if you need a particular distribution, since different methods might give slightly different distributions. Commented Sep 1, 2015 at 16:09
  • It depends if you want to improve only efficient for generate matricies the good solution will be call c( function for generate matrix from python. Commented Sep 1, 2015 at 20:10

2 Answers 2

2

I think you can first build a special case of such a matrix, and then use numpy.shuffle to shuffle it:

row_sum = 2
col_sum = 1
arr     = np.zeros((5, 10))
#generate a special case, with given row_sum and col_sum
for i in range(row_sum):
    arr.ravel()[i::arr.shape[1]+row_sum] = 1
arr

Out[84]: 
array([[ 1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.]])

np.random.shuffle(arr)
#np.random.shuffle(arr.T) to shuffle the columns
arr
Out[89]: 
array([[ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

arr.sum(1) #row sums
Out[90]: array([ 2.,  2.,  2.,  2.,  2.])

arr.sum(0) #col sums
Out[91]: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
Sign up to request clarification or add additional context in comments.

3 Comments

I would also suggest to be a bit lazy if possible. We can generate a new matrix just by defining a list of row numbers ([2, 4, 1, 3, 0] in the example) and going to the full-scale np.array if an assignment should be done, or to some sort of history of changes (but I'm not sure if it's okay for numpy to work with dynamic sized arrays).
Dynamic numpy array probably won't work, it was sort of discussed before stackoverflow.com/questions/6950456/… . I guess one probably goes for Fortran or C for dynamic array. But wait, that is no longer a lazy solution, :)
What if the rows are say [6, 5, 6, 4, 6, 7, 4, 5, 4, 4] and the columns [ 3, 6, 5, 7, 2, 8, 3, 3, 4, 10] rather than constants? Even if you had one solution simply shuffling wouldn't always produce others.
0

Tried this, and it works

np.mod(np.random.permutation(N*N).reshape(N,N),2)

Example:

>>> np.mod(np.random.permutation(4*4).reshape(4,4),2)  
array([[0, 0, 0, 1],
      [1, 1, 1, 0],
      [1, 0, 0, 1],
      [0, 1, 1, 0]])
>>> np.mod(np.random.permutation(4*4).reshape(4,4),2)  
array([[0, 0, 0, 1],
      [1, 1, 0, 0],
      [1, 1, 1, 1],
      [0, 0, 1, 0]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.