3

Given a python list of tuples such as:

test = [(1, 'string1', 47.9, -112.8, 6400.0),
        (2, 'string2', 29.7, -90.8, 11.0),
        (3, 'string3', 30.8, -99.1, 1644.0),
        (4, 'string4', 45.8, -110.9, 7500.0),
        (5, 'string5', 43.9, -69.8, 25.0)]

What is the most efficient way to build a 2D numpy array using the 3rd and 4th items from each tuple?

Desired output is:

array([[47.9, 29.7, 30.8, 45.8, 43.9],
       [-112.8, -90.8, -99.1, -110.9, -69.8]]) 

5 Answers 5

3

You can prepare the data outside numpy using a list comprehension which selects the 3rd and 4th item. Then you only need to transpose the resulting array:

np.array([x[2:4] for x in test]).T
Sign up to request clarification or add additional context in comments.

Comments

2

zip the list, slice it using itertools.islice:

from itertools import islice

np.array(list(islice(zip(*test), 2, 4)))
# array([[  47.9,   29.7,   30.8,   45.8,   43.9],
#        [-112.8,  -90.8,  -99.1, -110.9,  -69.8]])

Comments

1

You could transform the list of tuples directly into an array then use slicing and transposing to get the desired output:

import numpy as np

test = [(1, 'string1', 47.9, -112.8, 6400.0),
        (2, 'string2', 29.7, -90.8, 11.0),
        (3, 'string3', 30.8, -99.1, 1644.0),
        (4, 'string4', 45.8, -110.9, 7500.0),
        (5, 'string5', 43.9, -69.8, 25.0)]

arr = np.array(test, dtype=object)
result = arr[:, 2:4].T.astype(np.float32)
print(result)

Output

[[  47.9   29.7   30.8   45.8   43.9]
 [-112.8  -90.8  -99.1 -110.9  -69.8]]

Note that after doing arr = np.array(test) everything is done at numpy level.

4 Comments

Yes, this is likely the most efficient since it avoids list comprehension (everything is done within numpy).
For this method, I would do arr = np.array(test, dtype=object). As it stands, float values are converted to string and then converted back to float, which may result in loss of precision.
All the action may be in np.array, but that doesn't mean it is faster. Reading the list, converting to string, and then to float takes time. Loading a object dtype saves time. But for this small sample, selecting the columns first with a list comprehension is faster.
My real use-case will have len(test) of about 20,000
1

the first list is:

the_first = [item[2] for item in test]

and second is:

 second = [item[3] for item in test]

and the result is:

 result = np.array([the_first, second])

Comments

0

You can try this:

import numpy as np

test = [(1, 'string1', 47.9, -112.8, 6400.0), (2, 'string2', 29.7, -90.8, 11.0), (3, 'string3', 30.8, -99.1, 1644.0), (4, 'string4', 45.8, -110.9, 7500.0), (5, 'string5', 43.9, -69.8, 25.0)]

result = np.array([(item[3], item[4]) for item in test]).T
print(result)

# array([[-112.8,  -90.8,  -99.1, -110.9,  -69.8],
#       [6400. ,   11. , 1644. , 7500. ,   25. ]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.