1

I have a problem about list to array conversion. I have a list from a csv file, like

a=[['1','a'],['2','b']]

Now I only want the first column, the number '1' and '2', and convert them to a numpy array. How do I accomplish this? Using b = np.array(a) put all items as string into the array.

4 Answers 4

3

You can use numpy.fromiter with operator.itemgetter. Note a standard NumPy array is not a good choice for mixed types (dtype object), as this will cause all data to be stored in pointers.

a = [['1', 'a'], ['2', 'b']]

from operator import itemgetter

res = np.fromiter(map(itemgetter(0), a), dtype=int)

print(res)

array([1, 2])

Some performance benchmarking:

a = [['1', 'a'], ['2', 'b']] * 10000

%timeit np.fromiter(map(itemgetter(0), a), dtype=int)  # 4.31 ms per loop
%timeit np.array(a)[:, 0].astype(int)                  # 15.1 ms per loop
%timeit np.array([i[0] for i in a]).astype(int)        # 8.3 ms per loop

If you need a structured array of mixed types:

x = np.array([(int(i[0]), i[1]) for i in a],
             dtype=[('val', 'i4'), ('text', 'S10')])

print(x)

array([(1, b'a'), (2, b'b')], 
      dtype=[('val', '<i4'), ('text', 'S10')])
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. This is an elegant solution. But what if I do not know which column is number? I have a very big list and I do not want to check which row is number. I simply want the code to select the number from string. How do I do this?
In each row, will be same column be a number? Choosing which column to convert is likely beyond the scope of your question. I'd advise you post a new question with exactly the scenario you want to handle.
Yes, one column has a certain type, for example all number in the first column and all string in the 10th column. I want to choose which row is number and which row is string and then convert all number row to an array. I will edit my question.
@ZhaoHao, You shouldn't change your question. This may seem harsh, but there are already 4 answers. All of them will have to be updated or deleted. Some people may be asleep by now, depending on their time zone and miss this completely. Please ask a new question instead.
3

You'd first need to create a new list`, that only contains the first values of the lists in a. For example

c = []
for row in a:
    c.append(row[0])
b = np.array(c)

More Pythonic would probably be a list comprehension:

c = [x[0] for x in a]
b = np.array(c)

Comments

1
import numpy as np

a = [['1', 'a'], ['2', 'b']]
print(np.array(a)[:, 0].astype(int))

Comments

0

try this:

a=array([int(i[0]) for i in a])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.