0

I'd like to have Numpy efficiently convert each element of a numeric array (e.g. float32) to a formatted array (i.e. string-like). I can make this work as I expect by iterating each element to a list:

import numpy as np
a = (10 ** np.arange(-5, 6, 2, dtype='d') * 3.14159).astype('f')
# array([3.14159e-05, 3.14159e-03, 3.14159e-01, 3.14159e+01, 3.14159e+03,
#        3.14159e+05], dtype=float32)

# Good conversion to a list
print([str(x) for x in a])
# ['3.14159e-05', '0.00314159', '0.314159', '31.4159', '3141.59', '314159.0']
print(list(map(lambda x: str(x), a)))  # also does the same

# Expected result: a string-like Numpy array
print(repr(np.array([str(x) for x in a])))
# array(['3.14159e-05', '0.00314159', '0.314159', '31.4159', '3141.59',
#        '314159.0'], dtype='<U11')

However, this example doesn't easily scale to multidimensional arrays, since map() or list comprehensions don't understand how additional dimensions work. I'd like a result provided as a Numpy array with a string-like datatype, as shown above.


Typically, numpy.vectorize could be used to do this, however each of my attempts with Numpy 1.15 do not return the expected result:

# Bad conversions with np.vectorize, all show the same result
f = np.vectorize(lambda x: str(x))
f = np.vectorize('%s'.__mod__)  # equivalent; gives same result
f = np.vectorize(lambda x: '{!s}'.format(x))  # also same, but modern formatter
print(f(a))
# array(['3.141590059385635e-05', '0.003141589928418398',
#        '0.31415900588035583', '31.4158992767334', '3141.590087890625',
#        '314159.0'], dtype='<U21')

(The reason why these results are bad is that it appears that Numpy upgraded the datatype from float32 to Python's native double precision; similar to [str(x) for x in a.tolist()])


Any ideas on how to either use map()/list comprehensions on arbitrary dimension Numpy arrays and/or fix np.vectorize to achieve an equivalent result?

2
  • Numpy has a string type. Does a.astype('|S10') work for you? Note you can change the string length, and my example assumes 10 characters is enough. Commented Oct 29, 2018 at 3:55
  • @svohara you are on to something, although more than 10 chars are needed; a.astype(str) gives 32 (either '<U32' or '|S32', depending on which Python version) Commented Oct 29, 2018 at 4:03

2 Answers 2

1

How about np.char.mod?

import numpy as np
np.char.mod('%.2f', np.random.rand(8, 8))

It outputs

array([['0.04', '0.86', '0.74', '0.45', '0.30', '0.09', '0.65', '0.58'],
       ['0.96', '0.58', '0.41', '0.29', '0.26', '0.54', '0.01', '0.59'],
       ['0.38', '0.86', '0.37', '0.14', '0.32', '0.57', '0.19', '0.28'],
       ['0.91', '0.80', '0.78', '0.39', '0.67', '0.51', '0.16', '0.70'],
       ['0.61', '0.12', '0.89', '0.68', '0.01', '0.23', '0.57', '0.18'],
       ['0.71', '0.29', '0.08', '0.01', '0.86', '0.03', '0.79', '0.75'],
       ['0.44', '0.84', '0.89', '0.75', '0.48', '0.88', '0.69', '0.20'],
       ['0.36', '0.69', '0.12', '0.60', '0.16', '0.39', '0.15', '0.02']],
      dtype='<U4')
Sign up to request clarification or add additional context in comments.

Comments

0

You could simply use astype with dtype 'str'

a.astype(dtype=str)

# array(['3.14159e-05', '0.00314159', '0.314159', '31.4159', '3141.59',
#       '314159.0'], dtype='<U32')

Edit: just saw your comment that you have figured it out by yourself. Nevertheless I will keep my answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.