There are many steps to do that for the data I have, I will show you the steps taken so far where I stuck:
I have this df:
df = pd.DataFrame(np.array([['Iza', '2020-12-01 10:34:00'],['Iza', '2020-12-02 10:34:00'],['Iza', '2020-12-01 17:34:00'],['Iza', '2020-12-01 17:34:00'],['Sara', '2020-12-04 17:34:00'], ['Sara', '2020-12-04 20:11:00'], ['Sara', '2020-12-06 17:34:00'],['Silvia', '2020-12-07 18:34:00'],['Silvia', '2020-12-09 11:22:00'],['Paul', '2020-12-09 11:22:00'],['Paul', '2020-12-08 11:22:00'],['Paul', '2020-12-07 11:22:00']]),
columns=['Name', 'Time'])
df:
Name Time
0 Iza 2020-12-01 10:34:00
1 Iza 2020-12-02 10:34:00
2 Iza 2020-12-01 17:34:00
3 Iza 2020-12-01 17:34:00
4 Sara 2020-12-04 17:34:00
5 Sara 2020-12-04 20:11:00
6 Sara 2020-12-06 17:34:00
7 Silvia 2020-12-07 18:34:00
8 Silvia 2020-12-09 11:22:00
9 Paul 2020-12-09 11:22:00
10 Paul 2020-12-08 11:22:00
11 Paul 2020-12-07 11:22:00
I converted the time column to datetime:
df['Time'] = pd.to_datetime(df['Time'])
Now I want to get days in names and find the percentage of each day per name in columns:
df['Day'] = df['Time'].dt.day_name()
Result:
Name Time Day
0 Iza 2020-12-01 10:34:00 Tuesday
1 Iza 2020-12-02 10:34:00 Wednesday
2 Iza 2020-12-01 17:34:00 Tuesday
3 Iza 2020-12-01 17:34:00 Tuesday
4 Sara 2020-12-04 17:34:00 Friday
5 Sara 2020-12-04 20:11:00 Friday
6 Sara 2020-12-06 17:34:00 Sunday
7 Silvia 2020-12-07 18:34:00 Monday
8 Silvia 2020-12-09 11:22:00 Wednesday
9 Paul 2020-12-09 11:22:00 Wednesday
10 Paul 2020-12-08 11:22:00 Tuesday
11 Paul 2020-12-07 11:22:00 Monday
df2 = round(df.groupby(['Name'])['Day'].apply(lambda x: x.value_counts(normalize=True)) * 100)
Result:
Name
Iza Tuesday 75.0
Wednesday 25.0
Paul Monday 33.0
Tuesday 33.0
Wednesday 33.0
Sara Friday 67.0
Sunday 33.0
Silvia Wednesday 50.0
Monday 50.0
Name: Day, dtype: float64
I stuck here, my desired output - days in columns with % for each per name:
Name Sunday Monday Tuesday Wednesday Friday
Iza NaN NaN 75 25 NaN
Paul NaN 33 33 33 NaN
Sara 33 NaN NaN NaN 67
Silvia NaN 50 NaN 50 NaN