8

Question:

I would like to gain a better understanding of the Pandas DataFrame.query method and what the following expression represents:

match = dfDays.query('index > @x.name & price >= @x.target')

What does @x.name represent?

I understand what the resulting output is for this code (a new column with pandas.tslib.Timestamp data) but don't have a clear understanding of the expression used to get this end result.

Data:

From here:

Vectorised way to query date and price data

np.random.seed(seed=1)
rng = pd.date_range('1/1/2000', '2000-07-31',freq='D')
weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),))
ts2 = pd.Series(weeks
               ,index=rng)
dfDays = pd.DataFrame({'price':ts2})
dfWeeks = dfDays.resample('1W-Mon').first()
dfWeeks['target'] = (dfWeeks['price'] + .5).round(2)

def find_match(x):
    match = dfDays.query('index > @x.name & price >= @x.target')
    if not match.empty:
        return match.index[0]

dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))

2 Answers 2

10

@x.name - @ helps .query() to understand that x is an external object (doesn't belong to the DataFrame for which the query() method was called). In this case x is a DataFrame. It could be a scalar value as well.

I hope this small demonstration will help you to understand it:

In [79]: d1
Out[79]:
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

In [80]: d2
Out[80]:
   a   x
0  1  10
1  7  11

In [81]: d1.query("a in @d2.a")
Out[81]:
   a  b  c
0  1  2  3
2  7  8  9

In [82]: d1.query("c < @d2.a")
Out[82]:
   a  b  c
1  4  5  6

Scalar x:

In [83]: x = 9

In [84]: d1.query("c == @x")
Out[84]:
   a  b  c
2  7  8  9
Sign up to request clarification or add additional context in comments.

1 Comment

For d1.query("c < @d2.a") I got ValueError: Can only compare identically-labeled Series objects.
6

Everything @MaxU said is perfect!

I wanted to add some context to the specific problem that this was applied to.

find_match

This is a helper function that is used in the dataframe dfWeeks.apply. Two things to note:

  1. find_match takes a single argument x. This will be a single row of dfWeeks.
    • Each row is a pd.Series object and each row will be passed through this function. This is the nature of using apply.
    • When apply passes this row to the helper function, the row has a name attribute that is equal to the index value for that row in the dataframe. In this case, I know that the index value is a pd.Timestamp and I'll use it to do the comparing I need to do.
  2. find_match references dfDays which is outside the scope of find_match itself.

I didn't have to use query... I like using query. It is my opinion that it makes some code prettier. The following function, as provided by the OP, could've been written differently

def find_match(x):
    """Original"""
    match = dfDays.query('index > @x.name & price >= @x.target')
    if not match.empty:
        return match.index[0]

dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))

find_match_alt

Or we could've done this, which may help to explain what the query string is doing above

def find_match_alt(x):
    """Alternative to OP's"""
    date_is_afterwards = dfDays.index > x.name
    price_target_is_met = dfDays.price >= x.target
    both_are_true = price_target_is_met & date_is_afterwards
    if (both_are_true).any():
        return dfDays[both_are_true].index[0]

dfWeeks.assign(target_hit=dfWeeks.apply(find_match_alt, 1))

Comparing these two functions should give good perspective.

2 Comments

Great explanation, exactly what I was looking for. I wish I could upvote it more than once!
Very neat answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.