How to find string data-type that includes a number in Pandas DataFrame

Question

I have a DataFrame with two columns. One column contain string values that may or may not include numbers (integer or float).

Sample:

import pandas as pd
import numpy as np

data = [('A', '>10'),
        ('B', '10'),
        ('C', '<10'),
        ('D', '10'),
        ('E', '10-20'),
        ('F', '20.0'),
        ('G', '25.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value'])

Entries in Column value have string data-type. But, their values might be numeric or not.

What I want to get:

Find which rows have numeric values in column value.
Remove other rows from dataset.

Final result will look like:

name    value    
'B'      10         
'D'      10 
'F'      20.0  
'G'      25.1

I tried to use isnumeric() function but it returns True only for integers (not float).

If you have any idea to solve this problem, please let me know.

Updated Question (multi columns):

(The same question when there are more than one column with numeric values)

Similarly, I have a DataFrame with three columns. Two columns contain string values that may or may not include numbers (integer or float).

Sample:

import pandas as pd
import numpy as np

data = [('A', '>10', 'ABC'),
        ('B', '10', '15'),
        ('C', '<10', '>10'),
        ('D', '10', '15'),
        ('E', '10-20', '10-30'),
        ('F', '20.0', 'ABC'),
        ('G', '25.1', '30.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value1', 'value2'])

Entries in Columns value1 & value2 have string data-type. But, their values might be numeric or not.

What I want to get:

Find which rows have numeric values in columns value1 & value2.
Remove other rows from dataset.

Final result will look like:

name    value1    value2
'B'      10         15
'D'      10         15 
'G'      25.1       30.1

mozway · Accepted Answer · 2022-01-12 12:47:53Z

2

You can use pandas.to_numeric with errors='coerce', then dropna to remove the invalid rows:

(data_df.assign(value=pd.to_numeric(data_df['value'], errors='coerce'))
        .dropna(subset=['value'])
)

NB. this upcasts the integers into floats, but this is the way Series works and it's better to have upcasting than forcing an object type

output:

  name  value
1    B   10.0
3    D   10.0
5    F   20.0
6    G   25.1

If you just want to slice the rows and keep the string type:

data_df[pd.to_numeric(data_df['value'], errors='coerce').notna()]

output:

  name value
1    B    10
3    D    10
5    F  20.0
6    G  25.1

updated question (multi columns)

build a mask and use any/all prior to slicing:

mask = data_df[data_df.columns[1:]].apply(pd.to_numeric, errors='coerce').notna().all(1)
data_df[mask]

edited Jan 12, 2022 at 12:47

answered Jan 12, 2022 at 11:51

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mohammad Over a year ago

Thanks. Is there any way that I can use string data-type for name of column in function "assign". I mean; data_df.assign('value'=.....). I have several columns and I want to use a For loop for all columns, instead of repeating this command for every column.

mozway Over a year ago

@Mohammad can you update your question with an example of multi-column data (or maybe open a new question that references this one)

Mohammad Over a year ago

Sorry for trouble. I edited the question. Thanks for your help.

mozway Over a year ago

@Mohammad check the update (and please add your new example as an addendum at the end of your question, not replacing the original data, this avoid invalidating the first part of my answer)

Collectives™ on Stack Overflow

How to find string data-type that includes a number in Pandas DataFrame

1 Answer 1

updated question (multi columns)

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

updated question (multi columns)

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related