0

I have a dataframe with multiple columns. One or more than one column contain string values that may or may not include numbers (integer or float).

import pandas as pd
import numpy as np

data = [('A', '>10', 'ABC'),
        ('B', '10', '15'),
        ('C', '<10', '>10'),
        ('D', '10', '15'),
        ('E', '10-20', '10-30'),
        ('F', '20.0', 'ABC'),
        ('G', '25.1', '30.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value1', 'value2'])

I am looking for a method to check each of the cells inside the dataframe if there is any value which is assigned as strings but contains numerical(integer or float) value and then change it to integer or float by keeping the whole dataframe intact(not changing it to array)

so far, I found "How to find string data-type that includes a number in Pandas DataFrame" article on stackoverflow useful, but this article is guided to drop the numerical values stored as string types.

2
  • Can you add expected ouput from sample data? Commented May 4, 2022 at 5:07
  • Thanks for replying @jezrael. Yes I can add expected output Commented May 4, 2022 at 5:09

1 Answer 1

0

If need all values numeric repalce non numeric to missing values:

data_df.iloc[:, 1:] = data_df.iloc[:, 1:].apply(pd.to_numeric, errors='coerce')
print (data_df)
  name value1 value2
0    A    NaN    NaN
1    B   10.0   15.0
2    C    NaN    NaN
3    D   10.0   15.0
4    E    NaN    NaN
5    F   20.0    NaN
6    G   25.1   30.1

If need replace missing values to original strings:

data_df.iloc[:, 1:] = (data_df.iloc[:, 1:]
                              .apply(pd.to_numeric, errors='coerce')
                              .fillna(data_df.iloc[:, 1:]))
print (data_df)
  name value1 value2
0    A    >10    ABC
1    B   10.0   15.0
2    C    <10    >10
3    D   10.0   15.0
4    E  10-20  10-30
5    F   20.0    ABC
6    G   25.1   30.1

But then get mixed types numeric with strings:

print (data_df.iloc[:, 1:].applymap(type))
            value1           value2
0    <class 'str'>    <class 'str'>
1  <class 'float'>  <class 'float'>
2    <class 'str'>    <class 'str'>
3  <class 'float'>  <class 'float'>
4    <class 'str'>    <class 'str'>
5  <class 'float'>    <class 'str'>
6  <class 'float'>  <class 'float'>

EDIT:

cols = data_df.select_dtypes(object).columns.difference(['name'], sort=False)
data_df[cols] = data_df[cols].apply(lambda x: pd.to_numeric(x.str.strip(), errors='coerce'))
Sign up to request clarification or add additional context in comments.

12 Comments

Thanks for answering. Your code suggestion worked, but the number of columns increased and when I am trying to pass the dataframe through one-hot encoding, showing a bunch of errors which wasn't the case before
@UnFury - hmm, can you add expected ouput after one-hot encoding ? And your code what use?
I believe it was a kernel issue due to which it wasn't running at first. after implementing data_df.iloc[:, 1:] = data_df.iloc[:, 1:].apply(pd.to_numeric, errors='coerce'), it changes all the numerical values stored as strings to NaN(which I wasn't expecting to do so, since I wanted to change the data type of the column which contains such values only).
Thanks for the reply. I am now seeing a different error AttributeError: Can only use .str accessor with string values!. Since I can't share you the expected output here, I have to find out a solution by my own. Thanks for helping me out. Please do feel free if you have any suggestion regarding to this issue.
@UnFury - Added EDIT - only strings columns strip spaces and assign back numeric, added difference for omit first name column
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.