0

I have a DF with 16 columns, I need to work on 2 of them to generate a table (creating a DF in this case), the first column is Type (has 2 type, Subscriber and one-time user, and the second is day (days of a week), following is the DS (part, the actual is more than 16K rows and 16 columns), I am working on:

Type,Trip_in_min,Day
One-time user,25,Sunday
Subscriber,11,Sunday
Subscriber,4,Sunday
One-time user,11,Sunday
Subscriber,9,Sunday
One-time user,4,Sunday
Subscriber,2,Sunday
One-time user,3,Sunday
Subscriber,2,Sunday
Subscriber,8,Sunday
One-time user,12,Monday
Subscriber,11,Monday
One-time user,2,Monday
Subscriber,2,Monday
One-time user,5,Tuesday
Subscriber,9,Tuesday
One-time user,6,Tuesday
Subscriber,16,Tuesday

I can get the following results, based on the above input

           Subscribers  One Time Users
Sunday               6               4
Monday               2               2
Tuesday              2               2
Wednesday            0               0
Thursday             0               0
Friday               0               0
Saturday             0               0

I have the following code:

import pandas as pd
file = pd.read_csv(r"D:\OneDrive\DataAnalysis\Test2.txt")
df = pd.DataFrame(file)
user_type = 0
regular = 0
regular1 = 0
regular2 = 0
regular3 = 0
regular4 = 0
regular5 = 0
regular6 = 0

one_time_user = 0
one_time_user1 = 0
one_time_user2 = 0
one_time_user3 = 0
one_time_user4 = 0
one_time_user5 = 0
one_time_user6 = 0
day_idx = 0
new_list = pd.DataFrame({'Subscribers': [0, 0, 0, 0, 0, 0, 0], 'One Time Users': [0, 0, 0, 0, 0, 0, 0]}, index=['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
for index1, row in df.iterrows():
    for i in range(0, len(df)):
        user_type = df['Type'].iloc[i]
        day = df['Day'].iloc[i]
        if day == 'Sunday':
            if user_type == 'Subscriber':
                regular += 1
            else:
                one_time_user += 1
        elif day == 'Monday':
            if user_type == 'Subscriber':
                regular1 += 1
            else:
                one_time_user1 += 1
        elif day == 'Tuesday':
            if user_type == 'Subscriber':
                regular2 += 1
            else:
                one_time_user2 += 1
        elif day == 'Wednesday':
            if user_type == 'Subscriber':
                regular3 += 1
            else:
                one_time_user3 += 1
        elif day == 'Thursday':
            if user_type == 'Subscriber':
                regular4 += 1
            else:
                one_time_user4 += 1
        elif day == 'Friday':
            if user_type == 'Subscriber':
                regular5 += 1
            else:
                one_time_user5 += 1
        elif day == 'Saturday':
            if user_type == 'Subscriber':
                regular6 += 1
            else:
                one_time_user6 += 1
    break
#print('Regular:', regular)
#print('One Tine User:', one_time_user)
new_list.at["Sunday", "Subscribers"] = regular
new_list.at['Sunday', "One Time Users"] = one_time_user
new_list.at["Monday", "Subscribers"] = regular1
new_list.at['Monday', "One Time Users"] = one_time_user1
new_list.at["Tuesday", "Subscribers"] = regular2
new_list.at['Tuesday', "One Time Users"] = one_time_user2
new_list.at["Wednesday", "Subscribers"] = regular3
new_list.at['Wednesday', "One Time Users"] = one_time_user3
new_list.at["Thursday", "Subscribers"] = regular4
new_list.at['Thursday', "One Time Users"] = one_time_user4
new_list.at["Friday", "Subscribers"] = regular5
new_list.at['Friday', "One Time Users"] = one_time_user5
new_list.at["Saturday", "Subscribers"] = regular6
new_list.at['Saturday', "One Time Users"] = one_time_user6
print(new_list)

But as you can see, it is a not an efficient one, I have been trying on this for days, so it will not be easy for me to list (or even remember) what I tried

Any suggestion to improve the code please? I just could not figure out how to make the proper looping

Thanks

1
  • 1
    Do you Really want an efficient loop? Pandas can do this without loops using groupby and pivot... Commented Jan 30, 2023 at 20:38

1 Answer 1

1

The no-loop version, just incase you want it...

  • uses pandas' groupby to get the results you want
  • uses pandas' pivot to create the layout you want
import pandas as pd
import io

data="""Type,Trip_in_min,Day
One-time user,25,Sunday
Subscriber,11,Sunday
Subscriber,4,Sunday
One-time user,11,Sunday
Subscriber,9,Sunday
One-time user,4,Sunday
Subscriber,2,Sunday
One-time user,3,Sunday
Subscriber,2,Sunday
Subscriber,8,Sunday
One-time user,12,Monday
Subscriber,11,Monday
One-time user,2,Monday
Subscriber,2,Monday
One-time user,5,Tuesday
Subscriber,9,Tuesday
One-time user,6,Tuesday
Subscriber,16,Tuesday"""

# Using StringIO as a pretend file...
df = pd.read_csv(io.StringIO(data))
print()
print(df)

# count the number of rows per Day/Type
df_agg = df.groupby(['Day','Type'],as_index=False).size()
print()
print(df_agg)

# Pivot the tyoe to be columns instead
df_pivot = df_agg.pivot(index='Day', columns='Type')
print()
print(df_pivot)

Demo; https://trinket.io/python3/d24d2b1e73

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.