18

I have a DataFrame. 1 column (name) has string values. I was wondering if there was a way to select rows based on a partial string match against a particular column, using the DataFrame.query() method.

I tried:

  • df.query('name.str.contains("lu")'). Error message: "TypeError: 'Series' objects are mutable, thus they cannot be hashed"
  • df.query('"lu" in name'). Returns an empty DataFrame.

The code I use:

import pandas as pd

df = pd.DataFrame({
    'name':['blue','red','blue'],
    'X1':[96.32,96.01,96.05]
}, columns=['name','X1'])  


print(df.query('"lu" in name').head())
print(df.query('name.str.contains("lu")').head())

I know I could use df[df['name'].str.contains("lu")] but I prefer to use query.

2
  • 2
    Seems like it is not implemented yet: github.com/pandas-dev/pandas/issues/8749 Commented Jul 5, 2017 at 18:15
  • @ayhan Thanks. You're welcome to convert the comment into an answer. Commented Jul 5, 2017 at 18:23

3 Answers 3

15

The issue that @ayhan refers to now shows how this can be achieved by using query's python engine:

print(df.query('name.str.contains("lu")', engine='python').head())

should work.

Sign up to request clarification or add additional context in comments.

Comments

4

This answer is out of date. Please check @petobens' answer.

As of version 0.20.2, query doesn't support partial string matching. There is an open future request about it and one of the core developers seems to agree that it would be a nice addition.

2 Comments

df.query( 'ColumnName.str.contains(\'sought_string\')') works for me in 0.24.2 to search sough_string in the value stored in the column ColumnName. I would note that the 'engine=python' seems not needed (anymore?)
The PR for this got closed as it bring in too much magic: github.com/pandas-dev/pandas/pull/26027 However @RhoPhi answer also just works. It even works for the recently added backtick functionality: df.query("`column name`.str.contains('sought_string')"). Much magic indeed.
4

The petobens solution now works with Query without engine spec, what increases the speed, acconding the manual.

Uses contains in query spec it's a powerful feature to handling string content because allow use regex.

import numpy as np
import pandas as pd
A = np.array(["Paulo", "Lucas", "Luana", "Larra", "BaLu","Bela"])
B = np.array([111, 222, 222, 333, 333, 777])
C = np.random.randint(10, 99, 6)
dt = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C'])
dt.set_index(['A', 'B'], inplace=True)
print(dt)
print("=============")
print(dt.query('A.str.contains("Lu")'))
print("=============")
print(dt.query('A.str.contains("L(a|u)", regex=True)'))
print("=============")
print(dt.query('A.str.contains("^L", regex=True)'))  # starts with L

The result is

A   B
1.1 Paulo  57
    Lucas  49
3.3 Luana  38
    Larra  82
5.5 BaLu   37
6.6 Bela   14
=============
            C
A   B
1.1 Lucas  49
3.3 Luana  38
5.5 BaLu   37
=============
            C
A   B
1.1 Lucas  49
3.3 Luana  38
    Larra  82
5.5 BaLu   37
=============
            C
A   B
1.1 Lucas  49
3.3 Luana  38
    Larra  82

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.