Python, Pandas: Filter rows of data frame based on function

Question

I'm trying to filter a python data frame based on a sub string in one of the columns.

If the number at position 13&14 of the ID field is <=9, I want to keep the row, if it's > 9, I want to drop the row.

Example:

ABCD-3Z-A93Z-01A-11R-A37O-07 -> keep

ABCD-3Z-A93Z-11A-11R-A37O-07 -> drop

I've managed to get to the below solution, but I think there must be a quicker, more efficient way.

import pandas as pd

# Enter some data. We want to filter out all rows where the number at pos 13,14 > 9
df = {'ID': ['ABCD-3Z-A93Z-01A-11R-A37O-07', 'ABCD-6D-AA2E-11A-11R-A37O-07', 'ABCD-6D-AA2E-01A-11R-A37O-07',
             'ABCD-A3-3307-01A-01R-0864-07', 'ABCD-6D-AA2E-01A-11R-A37O-07', 'ABCD-6D-AA2E-10A-11R-A37O-07',
             'ABCD-6D-AA2E-09A-11R-A37O-07'],
      'year': [2012, 2012, 2013, 2014, 2014, 2017, 2015]
}
# convert to df
df = pd.DataFrame(df)

# define a function that checks if position 13&15 are > 9.
def filter(x):
    # that, if x is a string,
    if type(x) is str:
        if int(float(x[13:15])) <= 9:
            return True
        else:
            return False
    else:
        return False

# apply function
df['KeepRow'] = df['ID'].apply(filter)
print(df)

# Now filter out rows where "KeepRow" = False
df = df.loc[df['KeepRow'] == True]
print(df)
# drop the column "KeepRow" as we don't need it anymore
df = df.drop('KeepRow', axis=1)
print(df)

Alex Ozerov · Accepted Answer · 2017-10-15 12:12:45Z

5

I think you can just filter based in 13th symbol of your string:

import pandas as pd

# Enter some data. We want to filter out all rows where the number at pos 13,14 > 9
df = pd.DataFrame({
    'ID': ['ABCD-3Z-A93Z-01A-11R-A37O-07',
           'ABCD-6D-AA2E-11A-11R-A37O-07',
           'ABCD-6D-AA2E-01A-11R-A37O-07',
           'ABCD-A3-3307-01A-01R-0864-07',
           'ABCD-6D-AA2E-01A-11R-A37O-07',
           'ABCD-6D-AA2E-10A-11R-A37O-07',
           'ABCD-6D-AA2E-09A-11R-A37O-07'],
    'year': [2012, 2012, 2013, 2014, 2014, 2017, 2015]
})
# convert to df

df['KeepRow'] = df['ID'].apply(lambda x: x[13] == '0')

or simply:

df[df['ID'].apply(lambda x: x[13] == '0')]

answered Oct 15, 2017 at 12:12

Alex Ozerov

1,0289 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2017-10-15 12:13:49Z

2

Use indexing with str for values by positions, then convert to float and filter by boolean indexing:

df = df[df['ID'].str[13:15].astype(float) <=9]
print(df)
                             ID  year
0  ABCD-3Z-A93Z-01A-11R-A37O-07  2012
2  ABCD-6D-AA2E-01A-11R-A37O-07  2013
3  ABCD-A3-3307-01A-01R-0864-07  2014
4  ABCD-6D-AA2E-01A-11R-A37O-07  2014
6  ABCD-6D-AA2E-09A-11R-A37O-07  2015

Detail:

print(df['ID'].str[13:15])
0    01
1    11
2    01
3    01
4    01
5    10
6    09
Name: ID, dtype: object

edited Oct 15, 2017 at 12:13

answered Oct 15, 2017 at 12:08

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Python, Pandas: Filter rows of data frame based on function

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related