How to convert numpy object array into str/unicode array?

Question

Update: In lastest version of numpy (e.g., v1.8.1), this is no longer a issue. All the methods mentioned here now work as excepted.

Original question: Using object dtype to store string array is convenient sometimes, especially when one needs to modify the content of a large array without prior knowledge about the maximum length of the strings, e.g.,

>>> import numpy as np
>>> a = np.array([u'abc', u'12345'], dtype=object)

At some point, one might want to convert the dtype back to unicode or str. However, simple conversion will truncate the string at length 4 or 1 (why?), e.g.,

>>> b = np.array(a, dtype=unicode)
>>> b
array([u'abc', u'1234'], dtype='<U4')
>>> c = a.astype(unicode)
>>> c
array([u'a', u'1'], dtype='<U1')

Of course, one can always iterate over the entire array explicitly to determine the max length,

>>> d = np.array(a, dtype='<U{0}'.format(np.max([len(x) for x in a])))
array([u'abc', u'12345'], dtype='<U5')

Yet, this is a little bit awkward in my opinion. Is there a better way to do this?

Edit to add: According to this closely related question,

>>> len(max(a, key=len))

is another way to find out the longest string length, and this step seems to be unavoidable...

Not a solution, but max(len(x) for x in a) is probably faster than constructing a list and calling np.max. — Fred Foo
– Fred Foo, Commented Apr 17, 2013 at 15:21
I edited the question just before your comment:D max(a, key=len) is even faster. — herrlich10
– herrlich10, Commented Apr 17, 2013 at 16:05

jb. · Accepted Answer · 2015-04-17 07:18:57Z

27

I know this is an old question but in case anyone comes across it and is looking for an answer, try

c = a.astype('U')

and you should get the result you expect:

c = array([u'abc', u'12345'], dtype='<U5')

edited Apr 17, 2015 at 7:18

jb.

24.1k18 gold badges102 silver badges139 bronze badges

answered Sep 8, 2014 at 0:13

Fred

9941 gold badge9 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ThisGuyCantEven · Accepted Answer · 2018-01-08 17:34:59Z

5

At least in Python 3.5 Jupyter 4 I can use:

a=np.array([u'12345',u'abc'],dtype=object)
b=a.astype(str)
b

works just fine for me and returns:

array(['12345', 'abc'],dtype='<U5')

answered Jan 8, 2018 at 17:34

ThisGuyCantEven

1,26716 silver badges22 bronze badges

2 Comments

Tian Over a year ago

seems like if the array was initialised with dtype == np._str, using astype(str) will not convert the dtype

ThisGuyCantEven Over a year ago

Check this out: stackoverflow.com/questions/30086936/…

Collectives™ on Stack Overflow

How to convert numpy object array into str/unicode array?

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related