2

I find myself having to create dataframes which are filters of a larger dataframe quite often and I was wondering whether there is a way to program Python to do this for me?

For example, the dataset I'm working on now is app version data, looks like:

user_id | session_id | timestamp | time_seconds | app_version
 001    |   123      | 2014-01-01|    251       |     v1
 002    |   845      | 2014-01-01|    514       |     v1
 003    |   741      | 2014-01-02|    141       |     v1
 003    |   477      | 2014-01-03|    221       |     v2
 004    |   121      | 2014-01-03|    120       |     v2
 005    |   921      | 2014-01-04|    60        |     v3
...

I need to separate out the different app versions so each version has its own dataframe, and currently I'm doing it like this:

v1 = all_data[all_data['app_version'] == 'v1']
v2 = all_data[all_data['app_version'] == 'v2']
v3 = all_data[all_data['app_version'] == 'v3']

This seems very repetitive, is there a for loop I can write to do this for me?

1
  • 1
    Depending on what you actually want, you could use df.groupby('app_version') Commented Jan 30, 2018 at 12:33

1 Answer 1

6

I think you need create dictionary of DataFrames:

d = dict(tuple(df.groupby('app_version')))
print (d)
{'v2':    user_id  session_id   timestamp  time_seconds app_version
3        3         477  2014-01-03           221          v2
4        4         121  2014-01-03           120          v2, 
'v3':    user_id  session_id   timestamp  time_seconds app_version
5        5         921  2014-01-04            60          v3, 
'v1':    user_id  session_id   timestamp  time_seconds app_version
0        1         123  2014-01-01           251          v1
1        2         845  2014-01-01           514          v1
2        3         741  2014-01-02           141          v1}

print (d['v1'])
   user_id  session_id   timestamp  time_seconds app_version
0        1         123  2014-01-01           251          v1
1        2         845  2014-01-01           514          v1
2        3         741  2014-01-02           141          v1

print (type(d['v1']))
<class 'pandas.core.frame.DataFrame'>
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks but the output would have to be a dataframe for each version, not a dictionary.
It is dictionary of DataFrames - exactly what you need.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.