2

Given: a numpy array created from a string:

xy = np.array('4.9 3.5; 5.1 3.2; 4.7 3.1; 4.6 3.0; 5.0 5.4')

First off: is there a specific name for this construct?

Here is the datatype:

In [25]: xy
Out[25]:
array('4.9 3.5; 5.1 3.2; 4.7 3.1; 4.6 3.0; 5.0 5.4',
      dtype='|S43')

What is |S43 ..

So OK enough with internals.. So here is the real question: how do we use the generated array:

In [31]: cov(xy)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-6d999a60c1da> in <module>()
----> 1 cov(xy)

  .. 
TypeError: cannot perform reduce with flexible type

That result contrasts with the more standard usage of np.array:

In [33]: xy = np.array([[4.9, 3.5],[5.1, 3.2],[ 4.7, 3.1],[ 4.6, 3.0],[ 5.0, 5.4]], dtype=float)

In [35]: cov(xy)
Out[35]:
array([[ 0.98 ,  1.33 ,  1.12 ,  1.12 , -0.28 ],
       [ 1.33 ,  1.805,  1.52 ,  1.52 , -0.38 ],
       [ 1.12 ,  1.52 ,  1.28 ,  1.28 , -0.32 ],
       [ 1.12 ,  1.52 ,  1.28 ,  1.28 , -0.32 ],
       [-0.28 , -0.38 , -0.32 , -0.32 ,  0.08 ]])

So .. how to use the stringified numpy.array syntax to get that same result?

Update My bad here: i was mixing up numpy.array with numpy.matrix. The latter one does support the stringified syntax. See my answer below.

6
  • The |S43 means your type is a String with 43 chars Commented Nov 1, 2016 at 13:42
  • dtype='|S43' indicates that the array is a string array of length 43 (it has 43 characters). In other words, it is storing everything as a string, not as numbers. Commented Nov 1, 2016 at 13:42
  • You can't compute the covariance of a string. You have to use numbers (int, float ...) for computation. Commented Nov 1, 2016 at 13:49
  • can't compute cov of a string . Yea no kidding .. The assumption were that numpy performs the conversion. Maybe I am mixing up R with numpy, checking .. Commented Nov 1, 2016 at 13:52
  • The numpy array doesn't perform the conversion. Numpy arrays are generic types to store data of the same type. The type can be a string. In your case you create an array that contains one element (one string). Commented Nov 1, 2016 at 14:03

3 Answers 3

1

The problem: I was mixing numpy.array with numpy.matrix.

In [47]: np.matrix('1 2 3; 4 5 6')
Out[47]:
matrix([[1, 2, 3],
        [4, 5, 6]])
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, this input style was added to np.matrix to give MATLAB users something familiar. Add a .A to make an array. Of course it's only useful for toy examples.
0

You need to parse the string to a usable format before passing it to numpy.array. Try this:

# original string
xy_str = '4.9 3.5; 5.1 3.2; 4.7 3.1; 4.6 3.0; 5.0 5.4'

# break into nested lists, pass to numpy.array
xy = numpy.array([list(map(float, v.split())) for v in  xy_str.split('; ')])

Comments

0

Convert the string into a list of lists like what's in your correct example.

orig_xy_str = '4.9 3.5; 5.1 3.2; 4.7 3.1; 4.6 3.0; 5.0 5.4'
new_xy = np.array([vals.split(' ') for vals in orig_xy_string.split('; ')], dtype=float)

>>> np.cov(new_xy)
array([[ 0.98 ,  1.33 ,  1.12 ,  1.12 , -0.28 ],
       [ 1.33 ,  1.805,  1.52 ,  1.52 , -0.38 ],
       [ 1.12 ,  1.52 ,  1.28 ,  1.28 , -0.32 ],
       [ 1.12 ,  1.52 ,  1.28 ,  1.28 , -0.32 ],
       [-0.28 , -0.38 , -0.32 , -0.32 ,  0.08 ]])

If you have no control over the initial input (as you say you are "given a numpy array created from a string"), first convert the array to a string with orig_xy_str = str(xy)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.