Find string data-type that includes a number in Pandas DataFrame and change the type

Question

I have a dataframe with multiple columns. One or more than one column contain string values that may or may not include numbers (integer or float).

import pandas as pd
import numpy as np

data = [('A', '>10', 'ABC'),
        ('B', '10', '15'),
        ('C', '<10', '>10'),
        ('D', '10', '15'),
        ('E', '10-20', '10-30'),
        ('F', '20.0', 'ABC'),
        ('G', '25.1', '30.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value1', 'value2'])

I am looking for a method to check each of the cells inside the dataframe if there is any value which is assigned as strings but contains numerical(integer or float) value and then change it to integer or float by keeping the whole dataframe intact(not changing it to array)

so far, I found "How to find string data-type that includes a number in Pandas DataFrame" article on stackoverflow useful, but this article is guided to drop the numerical values stored as string types.

Can you add expected ouput from sample data?

jezrael
– jezrael

2022-05-04 05:07:18 +00:00
Commented May 4, 2022 at 5:07 — jezrael
– jezrael, Commented May 4, 2022 at 5:07
Thanks for replying @jezrael. Yes I can add expected output

UnFury
– UnFury

2022-05-04 05:09:16 +00:00
Commented May 4, 2022 at 5:09 — UnFury
– UnFury, Commented May 4, 2022 at 5:09

jezrael · Accepted Answer · 2022-05-04 05:49:54Z

0

If need all values numeric repalce non numeric to missing values:

data_df.iloc[:, 1:] = data_df.iloc[:, 1:].apply(pd.to_numeric, errors='coerce')
print (data_df)
  name value1 value2
0    A    NaN    NaN
1    B   10.0   15.0
2    C    NaN    NaN
3    D   10.0   15.0
4    E    NaN    NaN
5    F   20.0    NaN
6    G   25.1   30.1

If need replace missing values to original strings:

data_df.iloc[:, 1:] = (data_df.iloc[:, 1:]
                              .apply(pd.to_numeric, errors='coerce')
                              .fillna(data_df.iloc[:, 1:]))
print (data_df)
  name value1 value2
0    A    >10    ABC
1    B   10.0   15.0
2    C    <10    >10
3    D   10.0   15.0
4    E  10-20  10-30
5    F   20.0    ABC
6    G   25.1   30.1

But then get mixed types numeric with strings:

print (data_df.iloc[:, 1:].applymap(type))
            value1           value2
0    <class 'str'>    <class 'str'>
1  <class 'float'>  <class 'float'>
2    <class 'str'>    <class 'str'>
3  <class 'float'>  <class 'float'>
4    <class 'str'>    <class 'str'>
5  <class 'float'>    <class 'str'>
6  <class 'float'>  <class 'float'>

EDIT:

cols = data_df.select_dtypes(object).columns.difference(['name'], sort=False)
data_df[cols] = data_df[cols].apply(lambda x: pd.to_numeric(x.str.strip(), errors='coerce'))

edited May 4, 2022 at 5:49

answered May 4, 2022 at 5:09

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

UnFury Over a year ago

Thanks for answering. Your code suggestion worked, but the number of columns increased and when I am trying to pass the dataframe through one-hot encoding, showing a bunch of errors which wasn't the case before

jezrael Over a year ago

@UnFury - hmm, can you add expected ouput after one-hot encoding ? And your code what use?

UnFury Over a year ago

I believe it was a kernel issue due to which it wasn't running at first. after implementing data_df.iloc[:, 1:] = data_df.iloc[:, 1:].apply(pd.to_numeric, errors='coerce'), it changes all the numerical values stored as strings to NaN(which I wasn't expecting to do so, since I wanted to change the data type of the column which contains such values only).

UnFury Over a year ago

Thanks for the reply. I am now seeing a different error AttributeError: Can only use .str accessor with string values!. Since I can't share you the expected output here, I have to find out a solution by my own. Thanks for helping me out. Please do feel free if you have any suggestion regarding to this issue.

jezrael Over a year ago

@UnFury - Added EDIT - only strings columns strip spaces and assign back numeric, added difference for omit first name column

|

Collectives™ on Stack Overflow

Find string data-type that includes a number in Pandas DataFrame and change the type

1 Answer 1

12 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related