4

A certain dataframe, has a similar break with the one below:

import pandas as pd

df = pd.DataFrame({'name': ['John', 'Elvis', 'Gerrard', 'Pitty'],
              'age': [22,23,24,25],
              'document': [111,222,333,4444]})

How can I make a filter to return only the rows where the values ​​in the document column are only 3 digits?

1
  • 4
    I usually do not care about the accept question op picked , however, you should not using apply in this situation , since it will slowing down the whole process stackoverflow.com/questions/54432583/… Commented May 31, 2019 at 21:35

4 Answers 4

5

log

df.query('2 <= log10(document) < 3')

      name  age  document
0     John   22       111
1    Elvis   23       222
2  Gerrard   24       333

Expanded example

df = pd.DataFrame({
    'name': ['John', 'Elvis', 'Gerrard', 'Pitty'],
    'age': [22, 23, 24, 25],
    'document': [11, 222, 999, 1000]
})

df

      name  age  document
0     John   22        11  # 2 digit number
1    Elvis   23       222  # 3 digit number
2  Gerrard   24       999  # 3 digit number | edge case
3    Pitty   25      1000  # 4 digit number | edge case

Let's get only 3 digit numbers

df.query('2 <= log10(document) < 3')

      name  age  document
1    Elvis   23       222
2  Gerrard   24       999
Sign up to request clarification or add additional context in comments.

Comments

3

Try // which will return the divisor of 1000, since 3 digit so the divisor should be 0

df[df.document//1000==0]
Out[474]: 
      name  age  document
0     John   22       111
1    Elvis   23       222
2  Gerrard   24       333

We convert to str then count the len

df[df.document.astype(str).str.len().eq(3)]
Out[476]: 
      name  age  document
0     John   22       111
1    Elvis   23       222
2  Gerrard   24       333

Update

df[df.document.astype(str).str.split('.').str[0].str.len().eq(3)]

4 Comments

What about 2 digits? Would also appear in the // solution ;p Also,. what about floats in the astype(str) solution?
@rafaelc seems like you adding more edge situation . will update for u
df[df.document.floordiv(100).between(1, 9)]
@piRSquared: why don't you call between directly such as: df[df.document.between(100, 999)]
1
df[df.document.apply(lambda x: len(str(x)) == 3)]

Comments

0

This will work for positive and negative numbers in the document column:

df[df['document'].abs().astype(str).str.len() == 3]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.