I have a list of data frames that I'm opening in a for loop. For each data frame I want to query a portion of it and find the average.
This is what I have so far:
k = 0
for i in open('list.txt', 'r'):
k = k+1
i_name = i.strip()
df = pd.read_csv(i_name, sep='\t')
#Create queries
A = df.query('location == 1' and '1000 >= start <= 120000000')
B = df.query('location == 10' and '2000000 >= start <= 60000000')
print A
print B
#Find average
avgA = (sum(A['height'])/len(A['height']))
print avgA
avgB = (sum(B['height'])/len(B['height']))
print avgB
The problem is I'm not getting the average values I'm expecting (when doing it manually by excel). Printing the query results in the entire data frame being printed so I'm not sure if there's a problem with how I'm querying the data.
Am I correctly assigning the values A and B to the queries? Is there another way to do this that doesn't involve saving every data frame as a csv? I have many queries to create and don't want to save each intermediate query for hundreds of samples as I'm only interested in the average.