1

Im trying to put all matching elements in two different arrays into a singular one. However I'm running into a type error that I'm not entirely sure about.

This is what I initially tried to do:

IRS_zips = AGI.zipcode.unique() # np array of type int
medi_zips = df.nppes_provider_zip.unique() # np array of type object 

In order to find the matching elements I do:

like_zips = np.intersect1d(IRS_zips,medi_zips)

This throws this error:

TypeError: '<' not supported between instances of 'str' and 'int'

Which makes sense, so I check the types of both arrays and attempt to convert them, in this case medi_zips is not the right type so I try to convert that one:

medi_fixed = medi_zips.astype(int)

Which throws the error:

ValueError: invalid literal for int() with base 10: 'M4K 2'

I find this curious, so I look through the data frame for a value that equals 'M4K 2', I do find it, and it ends up being the first element of the dataframe and more importantly shows up as a number or in this case a zipcode. Which leads me to think that its an encoding issue maybe? Which im not very strong in.

EDIT:

As requested this is what the output for IRS_zips looks like:

array([    0, 35004, 35005, ..., 83127, 83128, 83414])

And this is the output array for medi_zips:

array(['21502', '60201', '43623', ..., '81656', '56137', '85246'],
      dtype=object)

The ideal output would be just a new array with the matched zips, however it is the errors that I listed above

EDIT 2:

This now works:

IRS_zips = AGI.zipcode.unique()
IRS_zips = (pd.to_numeric(IRS_zips, errors='coerce')).astype(int)

medi_zips = df.nppes_provider_zip.unique()
medi_int = pd.to_numeric(medi_zips, errors='coerce')
medi_int = (medi_int[~np.isnan(medi_int)]).astype(int)
10
  • how about also posting a sample data for readers? also an expected output. :) Commented Apr 18, 2019 at 17:47
  • So this particular value cannot be converted to an int. Is that what you expected? Commented Apr 18, 2019 at 17:50
  • Yes when I checked the type the initial type error made sense Commented Apr 18, 2019 at 17:53
  • how about pd.to_numeric(medi_zips,errors='coerce') this will convert to float Commented Apr 18, 2019 at 17:54
  • Would it make more sense to force the IRS_zips to string? All you want is to match them. Numeric order isn't important; string lexical order would be just as good. USA postal codes are numeric, but that's true for many other countries (e.g. Canada). Commented Apr 18, 2019 at 18:14

1 Answer 1

3

This is working for me

import numpy as np
import pandas as pd

IRS_zips = np.array([0, 1, 2, 3, 4])
medi_zips = np.array(['0', '1', '2', '3', '4c'])

medi_int = pd.to_numeric(medi_zips, errors='coerce')

medi_int = medi_int[~np.isnan(medi_int)]

like_zips = np.intersect1d(IRS_zips, medi_int)
Sign up to request clarification or add additional context in comments.

4 Comments

Same error ValueError: invalid literal for int() with base 10: 'M4K 2'
That means that some objects in your medi_zips are not convertible to int
TypeError: '<' not supported between instances of 'str' and 'int' I thought it was because the IRS_zips were string so I converted them to ints as well and now this error
Woops forgot to use medi_ints but now it works! Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.