Im trying to put all matching elements in two different arrays into a singular one. However I'm running into a type error that I'm not entirely sure about.
This is what I initially tried to do:
IRS_zips = AGI.zipcode.unique() # np array of type int
medi_zips = df.nppes_provider_zip.unique() # np array of type object
In order to find the matching elements I do:
like_zips = np.intersect1d(IRS_zips,medi_zips)
This throws this error:
TypeError: '<' not supported between instances of 'str' and 'int'
Which makes sense, so I check the types of both arrays and attempt to convert them, in this case medi_zips is not the right type so I try to convert that one:
medi_fixed = medi_zips.astype(int)
Which throws the error:
ValueError: invalid literal for int() with base 10: 'M4K 2'
I find this curious, so I look through the data frame for a value that equals 'M4K 2', I do find it, and it ends up being the first element of the dataframe and more importantly shows up as a number or in this case a zipcode. Which leads me to think that its an encoding issue maybe? Which im not very strong in.
EDIT:
As requested this is what the output for IRS_zips looks like:
array([ 0, 35004, 35005, ..., 83127, 83128, 83414])
And this is the output array for medi_zips:
array(['21502', '60201', '43623', ..., '81656', '56137', '85246'],
dtype=object)
The ideal output would be just a new array with the matched zips, however it is the errors that I listed above
EDIT 2:
This now works:
IRS_zips = AGI.zipcode.unique()
IRS_zips = (pd.to_numeric(IRS_zips, errors='coerce')).astype(int)
medi_zips = df.nppes_provider_zip.unique()
medi_int = pd.to_numeric(medi_zips, errors='coerce')
medi_int = (medi_int[~np.isnan(medi_int)]).astype(int)
int. Is that what you expected?pd.to_numeric(medi_zips,errors='coerce')this will convert to floatIRS_zipsto string? All you want is to match them. Numeric order isn't important; string lexical order would be just as good. USA postal codes are numeric, but that's true for many other countries (e.g. Canada).