1

I'm trying to access filtered versions of a dataframe, using a list with the filter values.

I'm using a while loop that I thought would plug the appropriate list values into a dataframe filter one by one. This code prints the first one fine but then prints 4 empty dataframes afterwards.

I'm sure this is a quick fix but I haven't been able to find it.

boatID = [342, 343, 344, 345, 346]
i = 0 
while i < len(boatID):
    df = df[(df['boat_id']==boatID[i])]
    #run some code, i'm printing DF.head to test it works
    print(df.head())
    i = i + 1

Example dataframe:

   boat_id  activity speed  heading
0      342         1  3.34   270.00
1      343         1  0.02     0.00
2      344         1  0.01   270.00
3      345         1  8.41   293.36
4      346         1  0.03    90.00 
2
  • Thanks for the suggestion, i'm not trying to return a bool value generated by isin, I'm trying to filter the DF where boat_id == some number. Commented Jan 22, 2016 at 23:38
  • Update, using int(boatID[i]) doesn't work either Commented Jan 22, 2016 at 23:39

1 Answer 1

1

I think you overwrite df by df in df = df[(df['boat_id']==boatID[i])]:

Maybe you need change output to new dataframe, e.g. df1:

boatID = [342, 343, 344, 345, 346]
i = 0 
while i < len(boatID):
    df1 = df[(df['boat_id']==boatID[i])]
    #run some code, i'm printing DF.head to test it works
    print(df1.head())
    i = i + 1

#   boat_id  activity  speed  heading
#0      342         1   3.34      270
#   boat_id  activity  speed  heading
#1      343         1   0.02        0
#   boat_id  activity  speed  heading
#2      344         1   0.01      270
#   boat_id  activity  speed  heading
#3      345         1   8.41   293.36
#   boat_id  activity  speed  heading
#4      346         1   0.03       90

If you need filter dataframe df with column boat_id by list boatID use isin:

df1 = df[(df['boat_id'].isin(boatID))]
print df1
#   boat_id  activity  speed  heading
#0      342         1   3.34   270.00
#1      343         1   0.02     0.00
#2      344         1   0.01   270.00
#3      345         1   8.41   293.36
#4      346         1   0.03    90.00

EDIT:

I think you can use dictionary of dataframes:

print df
   boat_id  activity  speed  heading
0      342         1   3.34   270.00
1      343         1   0.02     0.00
2      344         1   0.01   270.00
3      345         1   8.41   293.36
4      346         1   0.03    90.00

boatID = [342, 343, 344, 345, 346]

dfs = ['df' + str(x) for x in boatID]
dicdf = dict()

print dfs
['df342', 'df343', 'df344', 'df345', 'df346']

i = 0 
while i < len(boatID):
    print dfs[i]
    dicdf[dfs[i]] = df[(df['boat_id']==boatID[i])]
    #run some code, i'm printing DF.head to test it works
#    print(df1.head())
    i = i + 1
print dicdf
{'df344':    boat_id  activity  speed  heading
2      344         1   0.01      270, 'df345':    boat_id  activity  speed  heading
3      345         1   8.41   293.36, 'df346':    boat_id  activity  speed  heading
4      346         1   0.03       90, 'df342':    boat_id  activity  speed  heading
0      342         1   3.34      270, 'df343':    boat_id  activity  speed  heading
1      343         1   0.02        0}

print dicdf['df342']
   boat_id  activity  speed  heading
0      342         1   3.34      270
Sign up to request clarification or add additional context in comments.

3 Comments

thanks, this looks like it concatenates all the DF's into 5 dtype objects, in one DF, is that correct?
Now it creates 5 new dataframes df1 from df. In next loop old df1 is overwritten new df1
Is it possible rather than have df1 overwritten each time the loop runs, have something like df[i] to create 5 dataframes with a unique name like df342, df343 etc?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.