Comparing Numpy/Pandas arrays with mixed elements (string & floats)

Question

I have some Numpy arrays (or equivalently Pandas dataframes as it can be easily converted to) that I wish to compare. These arrays/dataframes contain both numbers and strings.

For purely numbers I can do the following.

import numpy as np
a = np.array([[1.0, 2.0], [1.00001, 2.00001]])
b = np.array([[1.000001, 2.00001], [1.00001, 2.00001]])
print(np.allclose(a, b, 1e-9))
# output: False
print(np.allclose(a, b, 1e-4))
# output: True

With a mixed array of the following, I am getting errors.

c = np.array([[1.0, "Cat"], [1.00001, 2.00001]])
d = np.array([[1.000001, "Dog"], [1.00001, 2.00001]])
e = np.array([[1.000001, "Cat"], [1.00001, 2.00001]])
print(np.allclose(c, d, 1e-4))
# expected output: False on account of the string difference
print(np.allclose(c, e, 1e-4))
# expected output: True

I tried converting it into a Pandas dataframe hoping that the builtin testing module might do the trick.

import pandas as pd
from pandas.util import testing as pdtest
df_c = pd.DataFrame(c)
df_d = pd.DataFrame(d)
df_e = pd.DataFrame(e)
print(pdtest.assert_almost_equal(df_c, df_e, check_exact=False, check_less_precise=4))
# expected output: True as the strings match and numbers agree within tolerance.

But this doesn't work. Is there a way to compare arrays where numerical elements are compared with a specified tolerance while string elements are compared exactly?

EDIT: The tolerance is purely for float elements. For strings, exact match is required.

Look at c or d (i.e. print). Note the dtype. We should probably close this because you failed to describe the allclose errors, and/or attempt any followup. But the basic issue is the allclose, which uses isclose is designed for use with numeric arrays, not string arrays. — hpaulj
– hpaulj, Commented May 27, 2018 at 20:54
The core test is whether abs(x-y) is small enough. That doesn't apply to arrays that have a string dtype. — hpaulj
– hpaulj, Commented May 27, 2018 at 21:00

hpaulj · Accepted Answer · 2018-05-27 21:13:44Z

def myequal(i,j):
    # scalar comparison function of your own design
    if isinstance(i,str):
        return i==j
    else:
        return 1e04>abs(i-j)

The sample arrays, as object dtype:

In [74]: c = np.array([[1.0, "Cat"], [1.00001, 2.00001]],object)
    ...: d = np.array([[1.000001, "Dog"], [1.00001, 2.00001]],object)
    ...: e = np.array([[1.000001, "Cat"], [1.00001, 2.00001]],object)

In [75]: c
Out[75]: 
array([[1.0, 'Cat'],
       [1.00001, 2.00001]], dtype=object)
In [76]: d
Out[76]: 
array([[1.000001, 'Dog'],
       [1.00001, 2.00001]], dtype=object)
In [77]: e
Out[77]: 
array([[1.000001, 'Cat'],
       [1.00001, 2.00001]], dtype=object)

Use frompyfunc to apply myequal to elements of to arrays. Basically it takes care of broadcasted iteration

In [78]: f = np.frompyfunc(myequal,2,1)
In [79]: f(c,d)
Out[79]: 
array([[True, False],
       [True, True]], dtype=object)
In [80]: f(c,e)
Out[80]: 
array([[True, True],
       [True, True]], dtype=object)

Without object dtype, your arrays are string dtype, the only common dtype:

In [81]: np.array([[1.0, "Cat"], [1.00001, 2.00001]])
Out[81]: 
array([['1.0', 'Cat'],
       ['1.00001', '2.00001']], dtype='<U32')

This raises an error in allclose/isclose because the strings can't be tested for np.inf:

In [82]: np.isclose(_,_)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-82-c2e4de5fe672> in <module>()
----> 1 np.isclose(_,_)

/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in isclose(a, b, rtol, atol, equal_nan)
   2330     y = array(y, dtype=dt, copy=False, subok=True)
   2331 
-> 2332     xfin = isfinite(x)
   2333     yfin = isfinite(y)
   2334     if all(xfin) and all(yfin):

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

np.isfinite applies to numeric arrays, not string ones.

Collectives™ on Stack Overflow

Comparing Numpy/Pandas arrays with mixed elements (string & floats)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related