1

I have a column in a dataframe that I need to join on. The column contains mixed data types, eg:

s = pd.Series([3985500,'3985500',3985500.0,'3985500.0','3985500A','3985500B'])

I'm trying to convert everything that's numeric to int to ensure the key is found when joining. Whatever is string can remain string and the final column format is allowed to be string, as long as the floats are converted to int.

I have tried astype(), but it ignores floats and for some reason I keep on getting scientific notation (see index 2 and 3):

s.astype(int, errors='ignore')

0       3985500
1       3985500
2    3.9855e+06
3     3985500.0
4      3985500A
5      3985500B
dtype: object

I get pd.to_numeric to work on floats with a try-except:

try: int(pd.to_numeric(s[3]))
except ValueError: s[3]

3985500
dtype: int

However, as soon as I try it in a function it returns nothing:

def convert_to_int(cell):
  try: int(pd.to_numeric(cell))
  except ValueError: cell

convert_to_int(s[3])

Any idea why this is happening? There might be other workarounds, but why is it not working when it's in a function?

I wish to use this function with s.apply(). I have looked at a couple of similar posts:

4
  • 1
    return int(pd.to_numeric(cell)). Also, you can probably pass downcast argument as "integer" to to_numeric and avoid having to use int to cast. Commented Jul 19, 2020 at 8:02
  • Do you run this in a notebook? Your bare try-except should not produce output either, otherwise. Commented Jul 19, 2020 at 8:05
  • Thanks @lightalchemist, the return is exactly what was missing. See the accepted answer. However, using the downcast option still returns the value at index 2 as a float for some reason. Commented Jul 19, 2020 at 15:12
  • @MisterMiyagi, yes I'm running this in a notebook which is why it was displaying a result in the bare try-except. Commented Jul 19, 2020 at 15:12

2 Answers 2

1

You are not returning any value from your function i.e,

def convert_to_int(cell):
  try: 
      return int(pd.to_numeric(cell))
  except ValueError: 
      cell

convert_to_int(s[3])

And to use apply() syntax:

s.apply(lambda x:convert_to_int(x))
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! This was a rookie mistake. I assumed the return is implied as it was displaying a value in Google Colab without the return when it's without the function. But it makes sense as it was not a function yet. I accepted this as the correct answer as it answers the question on why it's not working as a function. The answer by @MayankPorwal seems more efficient though.
Btw, the except also needs a return if I'm not mistaken?
1

You need to convert all numeric type values into int. You don't need to have a separate function, you can simply use Series.apply, like this:

In [202]: s = pd.Series([3985500,'3985500',3985500.0,'3985500.0','3985500A','3985500B'])
In [203]: s = s.apply(lambda x:x if isinstance(x, str) else int(x)) 

In [204]: s
Out[204]: 
0      3985500
1      3985500
2      3985500
3    3985500.0
4     3985500A
5     3985500B
dtype: object

The above command converts all numeric types to int, keeping string as it is.

3 Comments

Thanks, but this method still returns the number at index 3 as a float. It seems more efficient though.
It returns a float at index 3 because, it us defined as a string while initializing s.
I see what you mean. That's why I went with the try-except, to first try and convert to int. How would you make provision for the string at index 3?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.