Longest string in numpy object_ array

Question

I'm using a numpy object_ array to store variable length strings, e.g.

a = np.array(['hello','world','!'],dtype=np.object_)

Is there an easy way to find the length of the longest string in the array without looping over all elements?

Alex Martelli · Accepted Answer · 2009-10-17 17:01:32Z

11

max(a, key=len) gives you the longest string (and len(max(a, key=len)) gives you its length) without requiring you to code an explicit loop, but of course max will do its own looping internally, as it couldn't possibly identify "the longest string" in any other way.

answered Oct 17, 2009 at 17:01

Alex Martelli

887k175 gold badges1.3k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

unutbu · Accepted Answer · 2009-10-17 16:59:52Z

8

If you store the string in a numpy array of dtype object, then you can't get at the size of the objects (strings) without looping. However, if you let np.array decide the dtype, then you can find out the length of the longest string by peeking at the dtype:

In [64]: a = np.array(['hello','world','!','Oooh gaaah booo gaah?'])

In [65]: a.dtype
Out[65]: dtype('|S21')

In [72]: a.dtype.itemsize
Out[72]: 21

answered Oct 17, 2009 at 16:59

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Comments

mmmmmm · Accepted Answer · 2009-10-17 16:15:10Z

0

No as the only place the length of each string is known is by the string. So you have to find out from every string what its length is.

answered Oct 17, 2009 at 16:15

mmmmmm

32.8k28 gold badges92 silver badges124 bronze badges

Comments

Tristan Forward · Accepted Answer · 2014-08-28 23:18:07Z

0

Say I want to get the longest string in the second column:

data_array = [['BFNN' 'Forested bog without permafrost or patterning, no internal lawns']
             ['BONS' 'Nonpatterned, open, shrub-dominated bog']]


def get_max_len_column_value(data_array, column):
    return len(max(data_array[:,[column]], key=len)[0])

get_max_len_column_value(data_array, 1)

>>>64

answered Aug 28, 2014 at 23:18

Tristan Forward

3,5647 gold badges37 silver badges43 bronze badges

Comments

ggagliano · Accepted Answer · 2018-11-09 17:15:35Z

I would also like to mention a C-like method:

int(string_array.dtype.itemsize/np.dtype(string_array.dtype.char+'1').itemsize)

It seems to be more efficient than the accepted answer:

codes_len = 10000
codes_size = 10000
string_array = np.random.choice(np.array([b'a', b'b']), [codes_size, codes_len])
string_array = np.array([b"".join(string_array[i]).decode('utf-8') for i in range(codes_size)])

%time res = int(string_array.dtype.itemsize/np.dtype(string_array.dtype.char+'1').itemsize)
print('result is:', str(res) + '\n')
>>> CPU times: user 21 µs, sys: 4 µs, total: 25 µs
>>> Wall time: 29.1 µs
>>> result is: 10000

%time res = len(max(string_array, key=len))
print('result is:', res)
>>> CPU times: user 333 ms, sys: 8.21 ms, total: 342 ms
>>> Wall time: 341 ms
>>> result is: 10000

Collectives™ on Stack Overflow

Longest string in numpy object_ array

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related