I was experimenting several use cases for the pandas query() method, and tried one argument that threw an exception, but yet caused an unwanted modification to the data in my DataFrame.
In [549]: syn_fmax_sort
Out[549]:
build_number name fmax
0 390 adpcm 143.45
1 390 aes 309.60
2 390 dfadd 241.02
3 390 dfdiv 10.80
....
211 413 dfmul 215.98
212 413 dfsin 11.94
213 413 gsm 194.70
214 413 jpeg 197.75
215 413 mips 202.39
216 413 mpeg2 291.29
217 413 sha 243.19
[218 rows x 3 columns]
So I wanted to use query() to just take out a subset of this dataframe that contains all the build_number of 392, so I tried:
In [550]: syn_fmax_sort.query('build_number = 392')
That threw a ValueError: cannot label index with a null key exception, but not only that, it returned back the full dataframe to me,and caused all the build_number to be set to 392:
In [551]: syn_fmax_sort
Out[551]:
build_number name fmax
0 392 adpcm 143.45
1 392 aes 309.60
2 392 dfadd 241.02
3 392 dfdiv 10.80
....
211 392 dfmul 215.98
212 392 dfsin 11.94
213 392 gsm 194.70
214 392 jpeg 197.75
215 392 mips 202.39
216 392 mpeg2 291.29
217 392 sha 243.19
[218 rows x 3 columns]
However, I have since figured out how to get value 392 only, if I used syn_fmax_sort.query('391 < build_number < 393'), it works/
So my question is: Is the behavior that I observed above when I queried the dataframe wrongly due to a bug in the query() method?
syn_fmax_sort.query('build_number == 392')==also caused the same thing, but I didn't mention it above. I think its because I did the==attempt after=, and the=screwed up my DataFrame to begin with that I thought it wouldn't work.