12

I have a Pandas Dataframe which has columns which look something like this:

df:

Column0   Column1     Column2
'MSC'       '1'        'R2'
'MIS'       'Tuesday'  '22'
'13'        'Finance'  'Monday'

So overall, in these columns are actual strings but also numeric values (integers) which are in string format.

I found this nice post about the pd.to_numeric and astype() methods, but I can't see if or how I could use them in my case.

Using:

pd.to_numeric(df, errors = 'ignore')

just results in skiping the whole columns. Instead of skipping the whole columns, I only want to skip the strings in those columns which can't be converted, move on to the next entry and try to convert the next string.

So in the end, my dataframe would look like this:

df:

Column0   Column1     Column2
'MSC'       1          'R2'
'MIS'      'Tuesday'    22
 13        'Finance'  'Monday'

Is there maybe an efficient way to loop over these columns and achieve that?

Best regards, Jan

EDIT: Thanks for all your suggestions! Since I am still a python beginner, @coldspeed and @sacul 's answers are easier to understand for me so I will go with one of them!

3
  • 1
    This just doesn't fit into the pandas model. You're looking for mixed types in a single column... unless you just prefer pandas syntax, you're losing any kind of efficiency gains you might get in data processing Commented Dec 4, 2018 at 23:07
  • 1
    Do you still need to retain the strings that won't convert? If so, would moving them to a new column be acceptable? Mixed data types in a single column doesn't work. Commented Dec 4, 2018 at 23:08
  • 1
    @roganjosh, although sometimes when you are using pandas to clean up data that you received with mixed types, this is relevant. Commented Feb 16, 2022 at 23:40

4 Answers 4

17

100% agree with the comments—mixing dtypes in columns is a terrible idea, performance wise.

For reference, however, I would do this with pd.to_numeric and fillna:

df2 = df.apply(pd.to_numeric, errors='coerce').fillna(df)
print(df2)
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday

Columns are cast to object dtype to prevent coercion. You can see this when you extract the values:

print(df2.values.tolist())
[['MSC', 1.0, 'R2'], ['MIS', 'Tuesday', 22.0], [13.0, 'Finance', 'Monday']]
Sign up to request clarification or add additional context in comments.

Comments

3

I would apply pd.to_numeric with errors='coerce', and update the original dataframe according to the results (see caveats in comments):

# show original string type:
df.loc[0,'Column1']
# '1'

df.update(df.apply(pd.to_numeric, errors='coerce'))

>>> df
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday

# show updated float type:
df.loc[0,'Column1']
# 1.0

Comments

3

Using to_numeric + ignore

df=df.applymap(lambda x : pd.to_numeric(x,errors='ignore'))
df
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday
df.applymap(type)
                 Column0                Column1                Column2
0          <class 'str'>  <class 'numpy.int64'>          <class 'str'>
1          <class 'str'>          <class 'str'>  <class 'numpy.int64'>
2  <class 'numpy.int64'>          <class 'str'>          <class 'str'>

Comments

3

Or you could simply use the isnumeric() method of str. I like it because the syntax is clear, although according to coldspeed's comment, this can become very slow on large df.

df = df.applymap(lambda x: int(x) if x.isnumeric() else x)

Example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([['a','b','c'],['1','1a','c']],columns=['Col1','Col2','Col3'])

In [3]: df
Out[3]:
  Col1 Col2 Col3
0    a    b    c
1    1   1a    c

In [4]: df.Col1.map(lambda x: int(x) if x.isnumeric() else x)
Out[4]:
0    a
1    1
Name: Col1, dtype: object

1 Comment

Please add a disclaimer that this can become very slow for large DataFrames.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.