1

Let's say I have a pandas dataframe from which I will need to repeatedly query a subset of rows. I'd like to wrap this in a function. The queries will vary, interrogating any number of columns. The operator for each column will always be the same. I'm considering something like this:

df = pd.DataFrame({'A': list('aabbccddeeff'), 'B': list('aaaabbbbcccc'),
                   'C': np.random.randint(5, size=12),
                   'D': np.random.randint(9, size=12)})

def query_df(df, **kwds):
    a_val = kwds.get('a', None)
    b_val = kwds.get('b', None)
    c_val = kwds.get('c', None)        
    d_val = kwds.get('d', None)
    query = 'A in {0} and B == {1} and C > {2} and D < {3}'.format(a_val, b_val, c_val, d_val)
    return df.query(query)

query_dict = {'a':['a', 'b', 'c', 'd'], 'b':'a', 'c':0, 'd':8}
print(query_df(df, **query_dict))

A  B  C  D 
1  a  a  1  6

Although this works, it doesn't allow queries directed to e.g. just columns A and C. All columns are hard-coded into the query string! How can I make this more flexible so that e.g. the following would also work:

query_df(df, {'a':['a', 'b', 'c', 'd'], 'b':'a'})
query_df(df, {'b':'a', 'c':6})
query_df(df, {'d':4})

Thanks in advance!

2
  • You might find this helpful stackoverflow.com/questions/34157811/… Commented Feb 5, 2019 at 10:45
  • That's indeed interesting and I hadn't seen those yet. However, it's not a clean a solution as I'd hoped. Wouldn't there exists a keyword instead of 'None' that I can slot into the query and lets pass all values in the respective column? Commented Feb 5, 2019 at 11:04

1 Answer 1

1

To give you an idea how this can be achieved:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': list('aabbccddeeff'), 'B': list('aaaabbbbcccc'),
                   'C': np.random.randint(5, size=12),
                   'D': np.random.randint(9, size=12)})
print(df)
def query_df(df, dicti):
    d = {
      'a' : 'A in %s' % dicti.get('a'),
      'b' : 'B == %s' % dicti.get('b'),
      'c' : 'C > %s' % dicti.get('c') ,
      'd' : 'D < %s' % dicti.get('d')
    }
    q = []
    for i, j in d.items():
      if i in dicti.keys():
        q.append(j)
        q.append(' and ')
    q = q[:len(q)-1]
    query = ''.join(q)
    print(query)
    return df.query(query)
#di = {'a':['a', 'b', 'c', 'd'], 'b':'"a"', 'c':0, 'd':8}
#di = {'b':'"a"', 'c':6}
#di = {'d':4}
di = {'a':['a', 'b', 'c', 'd'], 'b':'"a"'}
print(query_df(df, di))

I had to use double quotes for 'b' key ('b':'"a"') as you might notice.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.