Value Error Converting the datatype of elements in an array

Question

Im trying to put all matching elements in two different arrays into a singular one. However I'm running into a type error that I'm not entirely sure about.

This is what I initially tried to do:

IRS_zips = AGI.zipcode.unique() # np array of type int
medi_zips = df.nppes_provider_zip.unique() # np array of type object

In order to find the matching elements I do:

like_zips = np.intersect1d(IRS_zips,medi_zips)

This throws this error:

TypeError: '<' not supported between instances of 'str' and 'int'

Which makes sense, so I check the types of both arrays and attempt to convert them, in this case medi_zips is not the right type so I try to convert that one:

medi_fixed = medi_zips.astype(int)

Which throws the error:

ValueError: invalid literal for int() with base 10: 'M4K 2'

I find this curious, so I look through the data frame for a value that equals 'M4K 2', I do find it, and it ends up being the first element of the dataframe and more importantly shows up as a number or in this case a zipcode. Which leads me to think that its an encoding issue maybe? Which im not very strong in.

EDIT:

As requested this is what the output for IRS_zips looks like:

array([    0, 35004, 35005, ..., 83127, 83128, 83414])

And this is the output array for medi_zips:

array(['21502', '60201', '43623', ..., '81656', '56137', '85246'],
      dtype=object)

The ideal output would be just a new array with the matched zips, however it is the errors that I listed above

EDIT 2:

This now works:

IRS_zips = AGI.zipcode.unique()
IRS_zips = (pd.to_numeric(IRS_zips, errors='coerce')).astype(int)

medi_zips = df.nppes_provider_zip.unique()
medi_int = pd.to_numeric(medi_zips, errors='coerce')
medi_int = (medi_int[~np.isnan(medi_int)]).astype(int)

how about also posting a sample data for readers? also an expected output. :) — anky
– anky, Commented Apr 18, 2019 at 17:47
So this particular value cannot be converted to an int. Is that what you expected? — Tim Klein
– Tim Klein, Commented Apr 18, 2019 at 17:50
Yes when I checked the type the initial type error made sense — Sebastian Goslin
– Sebastian Goslin, Commented Apr 18, 2019 at 17:53
how about pd.to_numeric(medi_zips,errors='coerce') this will convert to float — anky
– anky, Commented Apr 18, 2019 at 17:54
Would it make more sense to force the IRS_zips to string? All you want is to match them. Numeric order isn't important; string lexical order would be just as good. USA postal codes are numeric, but that's true for many other countries (e.g. Canada). — hpaulj
– hpaulj, Commented Apr 18, 2019 at 18:14

Damiano C. · Accepted Answer · 2019-04-18 19:06:18Z

3

This is working for me

import numpy as np
import pandas as pd

IRS_zips = np.array([0, 1, 2, 3, 4])
medi_zips = np.array(['0', '1', '2', '3', '4c'])

medi_int = pd.to_numeric(medi_zips, errors='coerce')

medi_int = medi_int[~np.isnan(medi_int)]

like_zips = np.intersect1d(IRS_zips, medi_int)

edited Apr 18, 2019 at 19:06

answered Apr 18, 2019 at 17:57

Damiano C.

2821 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sebastian Goslin Over a year ago

Same error ValueError: invalid literal for int() with base 10: 'M4K 2'

Damiano C. Over a year ago

That means that some objects in your medi_zips are not convertible to int

Sebastian Goslin Over a year ago

TypeError: '<' not supported between instances of 'str' and 'int' I thought it was because the IRS_zips were string so I converted them to ints as well and now this error

Sebastian Goslin Over a year ago

Woops forgot to use medi_ints but now it works! Thank you!

Collectives™ on Stack Overflow

Value Error Converting the datatype of elements in an array

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related