I'm trying to visualize some data, but I'm not very experienced with the subject, and am having trouble finding the best bay to get what I'm looking for. I've searched around and found similar questions, but nothing that'll answer exactly what I want, so hopefully I'm not duplicating a common question.
Anyway, I have a DataFrame with a column for patient_id (and others, but this is the relevant one. For example:
patient_id other_stuff
0 000001 ...
1 000001 ...
2 000001 ...
3 000002 ...
4 000003 ...
5 000003 ...
6 000004 ...
etc
Where each row represents a specific episode that patient had. I want to plot the distribution in which the x axis is the number of episodes a patient had, and the y axis is the number of patients that have had said number of episodes. For example, based on the above, there's one patient with three episodes, one patient with two episodes, and two patients with one episode each, i.e. x = [1, 2, 3], y = [2, 1, 1]. Currently, I do the following:
episode_count_distribution = (
patients.patient_id
.value_counts() # the number of rows for each patient_id (i.e. episodes per patient)
.value_counts() # the number of patients for each possible row count above (i.e. distribution of episodes per patient)
.sort_index()
)
episode_count_distribution.plot()
This method does what I want, but strikes me as a bit opaque and hard to follow, so I'm wondering if there's a better way.
