0

I'm wondering about existing pandas functionalities, that I might not been able to find so far.

Bascially, I have a data frame with various columns. I'd like to select specific rows depending on the values of certain colums (FYI: i was interested in the value of column D, that had several parameters described in A-C).

E.g. I want to know which row(s) have A==1 & B==2 & C==5?

df
   A  B  C  D
0  1  2  4  a
1  1  2  5  b
2  1  3  4  c

df_result
1  1  2  5  b

So far I have been able to basically reduce this:

import pandas as pd

df = pd.DataFrame({'A': [1,1,1],
                   'B': [2,2,3],
                   'C': [4,5,4],
                   'D': ['a', 'b', 'c']})
df_A = df[df['A'] == 1]
df_B = df_A[df_A['B'] == 2]
df_C = df_B[df_B['C'] == 5]

To this:

parameter = [['A', 1],
             ['B', 2],
             ['C', 5]]

df_filtered = df
for x, y in parameter:
    df_filtered = df_filtered[df_filtered[x] == y]

which yielded the same results. But I wonder if there's another way? Maybe without loop in one line?

6
  • You can compound your conditions df[(df['A'] == 1) & (df['B'] == 2) & (df['C'] == 5)] without using a loop Commented Mar 7, 2016 at 15:10
  • 1
    But what if I don't know beforehand how my columns are called and which values I want them to have? Commented Mar 7, 2016 at 15:11
  • What do you mean? You must have some idea at some point which columns and values to compare? You can construct the conditions easily Commented Mar 7, 2016 at 15:14
  • My data frame is generated from a csv-file. So until I've actually loaded the file, I don't know how the columns were named. I do know what values I want to them to have, but since I want to generate several subdata sets I also load the values from a different file, where I've noted them. Right now I store a bunch of parameter combinations like the variable parameter that I loop through. Commented Mar 7, 2016 at 15:17
  • I guess it would be easier to have conditions like A==1 and B==2 and C==5 instead of your parameter list and then just query rows satisfying this condition like @John Galt showed by df.query() function... Commented Mar 7, 2016 at 16:01

2 Answers 2

1

You could use query() method to filter data, and construct filter expression from parameters like

In [288]: df.query(' and '.join(['{0}=={1}'.format(x[0], x[1]) for x in parameter]))
Out[288]:
   A  B  C  D
1  1  2  5  b

Details

In [296]: df
Out[296]:
   A  B  C  D
0  1  2  4  a
1  1  2  5  b
2  1  3  4  c

In [297]: query = ' and '.join(['{0}=={1}'.format(x[0], x[1]) for x in parameter])

In [298]: query
Out[298]: 'A==1 and B==2 and C==5'

In [299]: df.query(query)
Out[299]:
   A  B  C  D
1  1  2  5  b
Sign up to request clarification or add additional context in comments.

2 Comments

Wow, thank you! I didn't know about query. How would I have to change the code if I'd like to compare strings instead of integers? If changing all the values to strings, df.query() returns an empty DataFrame...
Ah, I figured it out! Just replaced '{0}=={1}' by '{0}==\"{1}\"'.
0

Just for the information if others are interested, I would have done it this way:

import numpy as np
matched = np.all([df[vn] == vv for vn, vv in parameters], axis=0)
df_filtered = df[matched]

But I like the query function better, now that I have seen it @John Galt.

1 Comment

Still, thank you for your input! I'll keep this method in mind, too. Could be useful in the future.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.