1

There is list of list of tuples:

[[(0, 0.5), (1, 0.6)], [(4, 0.01), (5, 0.005), (6, 0.002)], [(1,0.7)]]

I need to get matrix X x Y:

x = num of sublists
y = max among second eleme throught all pairs
elem[x,y] = second elem for x sublist if first elem==Y 
0 1 2 3 4 5 6
0.5 0.6 0 0 0 0 0
0 0 0 0 0.01 0.005 0.002
0 0.7 0 0 0 0 0

2 Answers 2

2

You can figure out the array's dimensions the following way. The Y dimension is the number of sublists

>>> data = [[(0, 0.5), (1, 0.6)], [(4, 0.01), (5, 0.005), (6, 0.002)], [(1,0.7)]]
>>> dim_y = len(data)
>>> dim_y
3

The X dimension is the largest [0] index of all of the tuples, plus 1.

>>> dim_x = max(max(i for i,j in sub) for sub in data) + 1
>>> dim_x
7

So then initialize an array of all zeros with this size

>>> import numpy as np
>>> arr = np.zeros((dim_x, dim_y))
>>> arr
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

Now to fill it, enumerate over your sublists to keep track of the y index. Then for each sublist use the [0] for the x index and the [1] for the value itself

for y, sub in enumerate(data):
    for x, value in sub:
        arr[x,y] = value

Then the resulting array should be populated (might want to transpose to look like your desired dimensions).

>>> arr.T
array([[0.5  , 0.6  , 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 0.01 , 0.005, 0.002],
       [0.   , 0.7  , 0.   , 0.   , 0.   , 0.   , 0.   ]])
Sign up to request clarification or add additional context in comments.

3 Comments

Cory, thanks for reply! I hoped there's some numpy's magic to avoid loops )) It seems there isn't )
If I had more coffee and thought for a while there might be.... this is just the naive way that came to mind. If I think of something more clever in pure numpy I'll update my answer.
Avoiding loops would require turning data into an array. That is a time consuming step. But data is a mix of integer (indices) and float (values), But more significantly, you have different number of tuples in each row.
2

As I commented in the accepted answer, data is 'ragged' and can't be made into a array.

Now if the data had a more regular form, a no-loop solution is possible. But conversion to such a form requires the same double looping!

In [814]: [(i,j,v) for i,row in enumerate(data) for j,v in row]
Out[814]: 
[(0, 0, 0.5),
 (0, 1, 0.6),
 (1, 4, 0.01),
 (1, 5, 0.005),
 (1, 6, 0.002),
 (2, 1, 0.7)]

'transpose' and separate into 3 variables:

In [815]: I,J,V=zip(*_)
In [816]: I,J,V
Out[816]: ((0, 0, 1, 1, 1, 2), (0, 1, 4, 5, 6, 1), (0.5, 0.6, 0.01, 0.005, 0.002, 0.7))

I stuck with the list transpose here so as to not convert the integer indices to floats. It may also be faster, since making an array from a list isn't a time-trivial task.

Now we can assign values via numpy magic:

In [819]: arr = np.zeros((3,7))
In [820]: arr[I,J]=V
In [821]: arr
Out[821]: 
array([[0.5  , 0.6  , 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 0.01 , 0.005, 0.002],
       [0.   , 0.7  , 0.   , 0.   , 0.   , 0.   , 0.   ]])

I,J,V could also be used as input to a scipy.sparse.coo_matrix call, making a sparse matrix.

Speaking of a sparse matrix, here's what a sparse version of arr looks like:

In list-of-lists format:

In [822]: from scipy import sparse
In [823]: M = sparse.lil_matrix(arr)
In [824]: M
Out[824]: 
<3x7 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in List of Lists format>
In [825]: M.A
Out[825]: 
array([[0.5  , 0.6  , 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 0.01 , 0.005, 0.002],
       [0.   , 0.7  , 0.   , 0.   , 0.   , 0.   , 0.   ]])
In [826]: M.rows
Out[826]: array([list([0, 1]), list([4, 5, 6]), list([1])], dtype=object)
In [827]: M.data
Out[827]: 
array([list([0.5, 0.6]), list([0.01, 0.005, 0.002]), list([0.7])],
      dtype=object)

and the more common coo format:

In [828]: Mc=M.tocoo()
In [829]: Mc.row
Out[829]: array([0, 0, 1, 1, 1, 2], dtype=int32)
In [830]: Mc.col
Out[830]: array([0, 1, 4, 5, 6, 1], dtype=int32)
In [831]: Mc.data
Out[831]: array([0.5  , 0.6  , 0.01 , 0.005, 0.002, 0.7  ])

and the csr used for most calculations:

In [832]: Mr=M.tocsr()
In [833]: Mr.data
Out[833]: array([0.5  , 0.6  , 0.01 , 0.005, 0.002, 0.7  ])
In [834]: Mr.indices
Out[834]: array([0, 1, 4, 5, 6, 1], dtype=int32)
In [835]: Mr.indptr
Out[835]: array([0, 2, 5, 6], dtype=int32)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.