Query data frame in python pandas, can't save query

Question

I have a list of data frames that I'm opening in a for loop. For each data frame I want to query a portion of it and find the average.

This is what I have so far:

k = 0
for i in open('list.txt', 'r'):

    k = k+1
    i_name = i.strip()
    df = pd.read_csv(i_name, sep='\t')
    
#Create queries
    A = df.query('location == 1' and '1000 >= start <= 120000000')
    B = df.query('location == 10' and '2000000 >= start <= 60000000')
    print A
    print B
    
#Find average
    avgA = (sum(A['height'])/len(A['height']))
    print avgA
    avgB = (sum(B['height'])/len(B['height']))
    print avgB

The problem is I'm not getting the average values I'm expecting (when doing it manually by excel). Printing the query results in the entire data frame being printed so I'm not sure if there's a problem with how I'm querying the data.

Am I correctly assigning the values A and B to the queries? Is there another way to do this that doesn't involve saving every data frame as a csv? I have many queries to create and don't want to save each intermediate query for hundreds of samples as I'm only interested in the average.

Tim Roberts · Accepted Answer · 2022-03-16 19:49:13Z

4

This does not do what you expect:

    A = df.query('location == 1' and '1000 >= start <= 120000000')
    B = df.query('location == 10' and '2000000 >= start <= 60000000')

You are doing the Python "and" of two strings. Since the first string has a True value, the result of that expression is "1000 >= start <= 120000000".

You want the "and" to be inside the query:

    A = df.query('location == 1 and 1000 >= start <= 120000000')
    B = df.query('location == 10 and 2000000 >= start <= 60000000')

Secondly, you have the inequality operators backwards. The first one is only going to get values less than or equal to 1000. What you really want is:

    A = df.query('location == 1 and 1000 <= start <= 120000000')
    B = df.query('location == 10 and 2000000 <= start <= 60000000')

answered Mar 16, 2022 at 19:49

Tim Roberts

55.3k4 gold badges28 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Query data frame in python pandas, can't save query

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related