5

I'm trying to use a select statement to drop all rows from a dateframe where the values in a certain column don't begin with 126.1.

An example of my data set is:

File      Date          Time         RA         Dec
ad0147.fits  18-02-13  22:26:01.779  126.109510  27.360011
ad0147.fits  18-02-13  22:26:01.779  126.061077  27.361124
ad0147.fits  18-02-13  22:26:01.779  125.994430  27.363504

I want to filter out all RA values that do not begin with 126.1.

I used this:

data2 = data2.drop(data2[str(data2['RA'])[0:5] is not str(126.1)].index)

where data2 is my dataframe.

It is returning the error "KeyError: True".

How can I fix this?

3
  • What do you think str(data2['RA'])[0:5] is not str(126.1) is doing? You should try breaking it up into individual parts and seeing what it's doing. Commented Feb 15, 2018 at 21:32
  • Don't use is and is not to compare strings, use == and !=. Commented Feb 15, 2018 at 21:36
  • FYI str(data2['RA'])[0:5] in your case evaluates to '0 '. That converts the slice of the series to a string, which will include the index. It's good practice to break up complicated logic like yours into smaller, more manageable pieces. Commented Feb 15, 2018 at 21:43

2 Answers 2

4

There's a lot wrong with:

str(data2['RA'])[0:5] is not str(126.1)

To begin with, is not will evaluate to True or False, but you are trying to create a boolean array for selection, so right off the bat this is misguided. Second, you should never use is to compare str object to begin with. For these sorts of string manipulations on pandas.Series objects, there are built-in vectorized methods accessible through .str which mimic the built-in string methods. So given:

>>> df
          File      Date          Time          RA        Dec
0  ad0147.fits  18-02-13  22:26:01.779  126.109510  27.360011
1  ad0147.fits  18-02-13  22:26:01.779  126.061077  27.361124
2  ad0147.fits  18-02-13  22:26:01.779  125.994430  27.363504
>>> df.dtypes
File     object
Date     object
Time     object
RA      float64
Dec     float64
dtype: object

You could use:

>>> df.RA.astype(str).str.startswith('126.1')
0     True
1    False
2    False
Name: RA, dtype: bool

And simply combine that with boolean-indexing:

>>> df[df.RA.astype(str).str.startswith('126.1')]
          File      Date          Time         RA        Dec
0  ad0147.fits  18-02-13  22:26:01.779  126.10951  27.360011
Sign up to request clarification or add additional context in comments.

Comments

2

Take a look at the .str method which is available on any Pandas Series (which is what the columns of the data frame are). It supports regular expression syntax. I often search for what I don't want and then negate it with the ~. Like this:

df = df[~df.RA.str.contains('126.1')]

1 Comment

OP wants to exclude that rows that start with 126.1, so .str.startswith is more appropriate here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.