6

I have a pandas data frame with 2 columns, type and text The text column contains string values. How can I delete rows which contains some numeric values in the text column. e.g:

`ABC 1.3.2`, `ABC12`, `2.2.3`, `ABC 12 1`

I have tried below, but get an error. Any idea why this is giving error?

df.drop(df[bool(re.match('^(?=.*[0-9]$)', df['text'].str))].index)
3
  • 1
    What is the definition of numeric? Commented Jun 11, 2018 at 18:39
  • Any numbers present within a string..eg "ABCD12", "ABC 1.3" , "ABC 1.3.3", "ABC 12" Commented Jun 12, 2018 at 3:44
  • As far as I know those are not numbers but digits. It is a question of definition though and in the end you are seeking an answer. Commented Jun 12, 2018 at 6:14

4 Answers 4

9

In your case, I think it's better to use simple indexing rather than drop. For example:

>>> df
       text type
0       abc    b
1    abc123    a
2       cde    a
3  abc1.2.3    b
4     1.2.3    a
5       xyz    a
6    abc123    a
7      9999    a
8     5text    a
9      text    a


>>> df[~df.text.str.contains(r'[0-9]')]
   text type
0   abc    b
2   cde    a
5   xyz    a
9  text    a

That locates any rows with no numeric text

To explain:

df.text.str.contains(r'[0-9]')

returns a boolean series of where there are any digits:

0    False
1     True
2    False
3     True
4     True
5    False
6     True
7     True
8     True
9    False

and you can use this with the ~ to index your dataframe wherever that returns false

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, this worked for me ! thanks. I just wanted to check if string contains any numeric characters like 1.2, 1 or 1.2.1 etc
This gives me the following error: TypeError: bad operand type for unary ~: 'float'. My column dtype is object. I assume that means there are numbers and strings mixed in there (loaded with read_csv). How would one get around that?
3

Data from jpp

s[s.str.isalpha()]
Out[261]: 
0    ABC
2    DEF
6    GHI
dtype: object

1 Comment

Looks good, but worth noting the implications, e.g. anything with - would be omitted; i.e. this mimics Python str.isalpha.
2

Assuming you define numeric as x.isdigit() evaluating to True, you can use any with a generator expression and create a Boolean mask via pd.Series.apply:

s = pd.Series(['ABC', 'ABC 1.3.2', 'DEF', 'ABC12', '2.2.3', 'ABC 12 1', 'GHI'])

mask = s.apply(lambda x: not any(i.isdigit() for i in x))

print(s[mask])

0    ABC
2    DEF
6    GHI
dtype: object

Comments

1

Well as I asked in the comment, what is your defintion of numeric. If we follow python's isnumeric with split() we get the following:

import pandas as pd

import pandas as pd

df = pd.DataFrame({
    'col1': ['ABC', 'ABC 1.3.2', 'DEF', 'ABC12', '2.2.3', 'ABC 12 1', 'GHI']
})

m1 = df['col1'].apply(lambda x: not any(i.isnumeric() for i in x.split()))
m2 = df['col1'].str.isalpha()
m3 = df['col1'].apply(lambda x: not any(i.isdigit() for i in x))
m4 = ~df['col1'].str.contains(r'[0-9]')

print(df.assign(hasnonumeric=m1,isalhpa=m2, isdigit=m3, contains=m4))

# Opting for hasnonumeric
df = df[m1]

prints:

        col1  hasnonumeric  isalhpa  isdigit  contains
0        ABC          True     True     True      True
1  ABC 1.3.2          True    False    False     False
2        DEF          True     True     True      True
3      ABC12          True    False    False     False
4      2.2.3          True    False    False     False
5   ABC 12 1         False    False    False     False
6        GHI          True     True     True      True

2 Comments

out of the 4 options, the contains works for my case. isalpha will return false if string contains special characters also like . , ' etc. I only want to check if numeric characters (1 or 1.2 or 1.2.1 or 1.2.1.2 etc ) are present, then filter out the dataframe.
Ok great. That is however not my answer. ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.