There are 3 main dtypes to store strings in numpy:
object: Stores pointers to Python objects
str: Stores fixed-width strings
numpy.types.StringDType(): New in numpy 2.0 and stores variable-width strings
str consumes more memory than object; StringDType is better
Depending on the length of the fixed-length string and the size of the array, the ratio differs but as long as the longest string in the array is longer than 2 characters, str consumes more memory (they are equal when the longest string in the array is 2 characters long). For example, in the following example, str consumes almost 8 times more memory.
On the other hand, the new (in numpy>=2.0) numpy.dtypes.StringDType stores variable width strings, so consumes much less memory.
from pympler.asizeof import asizeof
ar1 = np.array(['this is a string', 'string']*1000, dtype=object)
ar2 = np.array(['this is a string', 'string']*1000, dtype=str)
ar3 = np.array(['this is a string', 'string']*1000, dtype=np.dtypes.StringDType())
asizeof(ar2) / asizeof(ar1) # 7.944444444444445
asizeof(ar3) / asizeof(ar1) # 1.992063492063492
For numpy 1.x, str is slower than object
For numpy>=2.0.0, str is faster than object
Numpy 2.0 has introduced a new numpy.strings API that has much more performant ufuncs for string operations. A simple test (on numpy 2.2.0) below shows that vectorized string operations on an array of str or StringDType dtype is much faster than the same operations on an object dtype array.
import timeit
t1 = min(timeit.repeat(lambda: ar1*2, number=1000))
t2a = min(timeit.repeat(lambda: np.strings.multiply(ar2, 2), number=1000))
t2b = min(timeit.repeat(lambda: np.strings.multiply(ar3, 2), number=1000))
print(t2a / t1) # 0.8786601958427778
print(t2b / t1) # 0.7311586933668037
t3 = min(timeit.repeat(lambda: np.array([s.count('i') for s in ar1]), number=1000))
t4a = min(timeit.repeat(lambda: np.strings.count(ar2, 'i'), number=1000))
t4b = min(timeit.repeat(lambda: np.strings.count(ar3, 'i'), number=1000))
print(t4a / t3) # 0.13328748153237377
print(t4b / t3) # 0.3365874412749679
For numpy<2.0.0 (tested on numpy 1.26.0)
Numpy 1.x's vectorized string methods are not optimized, so operating on the object array is often faster. For example, in the example in the OP where each character is repeated, a simple * (aka multiply()) is not only more concise but also over 10 times faster than char.multiply().
import timeit
setup = "import numpy as np; from __main__ import ar1, ar2"
t1 = min(timeit.repeat("ar1*2", setup, number=1000))
t2 = min(timeit.repeat("np.char.multiply(ar2, 2)", setup, number=1000))
t2 / t1 # 10.650433758517027
Even for functions that cannot be readily be applied on the array, instead of the vectorized char method of str arrays, it is faster to loop over the object array and work on the Python strings.
For example, iterating over the object array and calling str.count() on each Python string is over 3 times faster than the vectorized char.count() on the str array.
f1 = lambda: np.array([s.count('i') for s in ar1])
f2 = lambda: np.char.count(ar2, 'i')
setup = "import numpy as np; from __main__ import ar1, ar2, f1, f2, f3"
t3 = min(timeit.repeat("f1()", setup, number=1000))
t4 = min(timeit.repeat("f2()", setup, number=1000))
t4 / t3 # 3.251369161574832
On a side note, if it comes to explicit loop, iterating over a list is faster than iterating over a numpy array. So in the previous example, a further performance gain can be made by iterating over the list
f3 = lambda: np.array([s.count('i') for s in ar1.tolist()])
# ^^^^^^^^^ <--- convert to list here
t5 = min(timeit.repeat("f3()", setup, number=1000))
t3 / t5 # 1.2623498005294627