2

I have a DataFrame with two columns. One column contain string values that may or may not include numbers (integer or float).

Sample:

import pandas as pd
import numpy as np

data = [('A', '>10'),
        ('B', '10'),
        ('C', '<10'),
        ('D', '10'),
        ('E', '10-20'),
        ('F', '20.0'),
        ('G', '25.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value'])

Entries in Column value have string data-type. But, their values might be numeric or not.

What I want to get:

  • Find which rows have numeric values in column value.

  • Remove other rows from dataset.

Final result will look like:

name    value    
'B'      10         
'D'      10 
'F'      20.0  
'G'      25.1       

I tried to use isnumeric() function but it returns True only for integers (not float).

If you have any idea to solve this problem, please let me know.


Updated Question (multi columns):

(The same question when there are more than one column with numeric values)

Similarly, I have a DataFrame with three columns. Two columns contain string values that may or may not include numbers (integer or float).

Sample:

import pandas as pd
import numpy as np

data = [('A', '>10', 'ABC'),
        ('B', '10', '15'),
        ('C', '<10', '>10'),
        ('D', '10', '15'),
        ('E', '10-20', '10-30'),
        ('F', '20.0', 'ABC'),
        ('G', '25.1', '30.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value1', 'value2'])

Entries in Columns value1 & value2 have string data-type. But, their values might be numeric or not.

What I want to get:

  • Find which rows have numeric values in columns value1 & value2.

  • Remove other rows from dataset.

Final result will look like:

name    value1    value2
'B'      10         15
'D'      10         15 
'G'      25.1       30.1

1 Answer 1

2

You can use pandas.to_numeric with errors='coerce', then dropna to remove the invalid rows:

(data_df.assign(value=pd.to_numeric(data_df['value'], errors='coerce'))
        .dropna(subset=['value'])
)

NB. this upcasts the integers into floats, but this is the way Series works and it's better to have upcasting than forcing an object type

output:

  name  value
1    B   10.0
3    D   10.0
5    F   20.0
6    G   25.1

If you just want to slice the rows and keep the string type:

data_df[pd.to_numeric(data_df['value'], errors='coerce').notna()]

output:

  name value
1    B    10
3    D    10
5    F  20.0
6    G  25.1
updated question (multi columns)

build a mask and use any/all prior to slicing:

mask = data_df[data_df.columns[1:]].apply(pd.to_numeric, errors='coerce').notna().all(1)
data_df[mask]
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. Is there any way that I can use string data-type for name of column in function "assign". I mean; data_df.assign('value'=.....). I have several columns and I want to use a For loop for all columns, instead of repeating this command for every column.
@Mohammad can you update your question with an example of multi-column data (or maybe open a new question that references this one)
Sorry for trouble. I edited the question. Thanks for your help.
@Mohammad check the update (and please add your new example as an addendum at the end of your question, not replacing the original data, this avoid invalidating the first part of my answer)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.