How do I create many filtered dataframes using a for loop in Python and Pandas?

Question

I find myself having to create dataframes which are filters of a larger dataframe quite often and I was wondering whether there is a way to program Python to do this for me?

For example, the dataset I'm working on now is app version data, looks like:

user_id | session_id | timestamp | time_seconds | app_version
 001    |   123      | 2014-01-01|    251       |     v1
 002    |   845      | 2014-01-01|    514       |     v1
 003    |   741      | 2014-01-02|    141       |     v1
 003    |   477      | 2014-01-03|    221       |     v2
 004    |   121      | 2014-01-03|    120       |     v2
 005    |   921      | 2014-01-04|    60        |     v3
...

I need to separate out the different app versions so each version has its own dataframe, and currently I'm doing it like this:

v1 = all_data[all_data['app_version'] == 'v1']
v2 = all_data[all_data['app_version'] == 'v2']
v3 = all_data[all_data['app_version'] == 'v3']

This seems very repetitive, is there a for loop I can write to do this for me?

Depending on what you actually want, you could use df.groupby('app_version') — Jan
– Jan, Commented Jan 30, 2018 at 12:33

jezrael · Accepted Answer · 2018-01-30 12:41:52Z

6

I think you need create dictionary of DataFrames:

d = dict(tuple(df.groupby('app_version')))
print (d)
{'v2':    user_id  session_id   timestamp  time_seconds app_version
3        3         477  2014-01-03           221          v2
4        4         121  2014-01-03           120          v2, 
'v3':    user_id  session_id   timestamp  time_seconds app_version
5        5         921  2014-01-04            60          v3, 
'v1':    user_id  session_id   timestamp  time_seconds app_version
0        1         123  2014-01-01           251          v1
1        2         845  2014-01-01           514          v1
2        3         741  2014-01-02           141          v1}

print (d['v1'])
   user_id  session_id   timestamp  time_seconds app_version
0        1         123  2014-01-01           251          v1
1        2         845  2014-01-01           514          v1
2        3         741  2014-01-02           141          v1

print (type(d['v1']))
<class 'pandas.core.frame.DataFrame'>

edited Jan 30, 2018 at 12:41

answered Jan 30, 2018 at 12:34

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jceg316 Over a year ago

Thanks but the output would have to be a dataframe for each version, not a dictionary.

jezrael Over a year ago

It is dictionary of DataFrames - exactly what you need.

Collectives™ on Stack Overflow

How do I create many filtered dataframes using a for loop in Python and Pandas?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related