Looping through multiple columns in a table in Python

Question

I'm trying to loop through a table that contains covid-19 data. My table has 4 columns: month, day, location, and cases. The values of each column in the table is stored in its own list, so each list has the same length. (Ie. there is a month list, day list, location list, and cases list). There are 12 months, with up to 31 days in a month. Cases are recorded for many locations around the world. I would like to figure out what day of the year had the most total combined global cases. I'm not sure how to structure my loops appropriately. An oversimplified sample version of the table represented by the lists is shown below.

In this small example, the result would be month 1, day 3 with 709 cases (257 + 452).

Month	Day	Location	Cases
1	1	CAN	124
1	1	USA	563
1	2	CAN	242
1	2	USA	156
1	3	CAN	257
1	3	USA	452
.	.	...	...
12	31	...	...

Lumber Jack · Accepted Answer · 2021-01-30 19:31:56Z

I assume that you've put all the data in the same data frame, df.

df = pandas.DataFrame()
df['Month'] = name_of_your_month_list
df['Day'] = name_of_your_daylist
df['Location'] = name_of_your_location_list
df['Cases'] = name_of_your_cases_list

df.Cases.max() gives you the biggest number of cases. I assume that there is on year only in the dataset. So df[df.Cases==df.Cases.max()].index gives youth index that you search

For the the day, just filter :

df[df.index==df[df.Cases==df.Cases.max()].index].Day

For the month:

df[df.index==df[df.Cases==df.Cases.max()].index].Month

For the number of cases:

df[df.index==df[df.Cases==df.Cases.max()].index].Cases

For the country :

df[df.index==df[df.Cases==df.Cases.max()].index].Location

Reading the comment, it is not clear if you search the biggest cases in a Location or of the day. If its from the day, you'll have to filter first with a groupby('Day') function, to use it as groupby('Day').max()

pakpe · Accepted Answer · 2021-01-30 21:16:01Z

0

You group your dataframe by month and day. Then iterate through the groups to find the group in which the sum of cases in all locations was max as shown below:

import pandas as pd
df = pd.DataFrame({'Month':[1,1,1,1,1,1], 'Day':[1,1,2,2,3,3],
                   'Location':['CAN', 'USA', 'CAN', 'USA','CAN', 'USA'],
                   'Cases':[124,563,242,156,257,452]})

grouped = df.groupby(['Month', 'Day'])
max_sum = 0
max_day = None
for idx, group in grouped:
    if group['Cases'].sum() > max_sum:
        max_sum = group['Cases'].sum()
        max_day = group

month = max_day['Month'].iloc[1]
day = max_day['Day'].iloc[1]
print(f'Maximum cases of {max_sum} occurred on {month}/{day}.')

#prints: Maximum cases of 709 occurred on 1/3

If you don't want to use Pandas, this is how you do it:

months = [1,1,1,1,1,1]
days = [1,1,2,2,3,3]
locations = ['CAN', 'USA', 'CAN', 'USA','CAN', 'USA']
cases = [124,563,242,156,257,452]
dic = {}
target_day = 0
count = 0

for i in range(len(days)):
    if days[i] != target_day:
        target_day = days[i]
        count = cases[i]
    else:
        count += cases[i]
        dic[f'{months[i]}/{days[i]}'] = count

max_cases = max(dic.values())
worst_day = list(dic.keys())[list(dic.values()).index(max_cases)]

print(f'Maximum cases of {max_cases} occurred on {worst_day}.')

#Prints: Maximum cases of 709 occurred on 1/3.

edited Jan 30, 2021 at 21:16

answered Jan 30, 2021 at 19:56

pakpe

5,4892 gold badges11 silver badges24 bronze badges

2 Comments

ls14 Over a year ago

Thank you for your solution. How would you do it without importing pandas? Only using lists.

pakpe Over a year ago

@ls14 See my amended answer.

Lihini Nisansala · Accepted Answer · 2021-01-30 19:22:38Z

0

you can check the max value in your cases list first. then map the max case's index with other three lists and obtain their values. ex: caseList = [1,2,3,52,1,0]

the maximum is 52. its index is 3. in your case you can get the monthList[3], dayList[3], locationList[3] respectively. then you get the relevant day, month and country which is having the most total global cases.

check whether this will help in your scenario.

answered Jan 30, 2021 at 19:22

Lihini Nisansala

463 bronze badges

1 Comment

ls14 Over a year ago

The problem at hand though is that we must find the total global cases for each day, so not just the max value in the cases table. We must find the combined cases for each day of each month.

PGS · Accepted Answer · 2021-01-30 19:27:42Z

0

You may use this strategy to get the required result.

daylist,monthlist,location,Cases = [1, 2, 3, 4], [1,1,1,1],['CAN','USA','CAN','USA'],[124,563,242,999]    
maxCases = Cases.index(max(Cases))
print("Max Case:",Cases[maxCases])
print("Location:",location[maxCases])
print("Month:",monthlist[maxCases])
print("Day:",daylist[maxCases])

answered Jan 30, 2021 at 19:27

PGS

793 bronze badges

1 Comment

ls14 Over a year ago

But it needs to find the highest number of of combined cases (for each location, for each day, for each month). Not just the max value in the cases list. So in my example, the date with the most cases would be month 1, day 3 with 709 total cases.

Collectives™ on Stack Overflow

Looping through multiple columns in a table in Python

4 Answers 4

Comments

2 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Month	Day	Location	Cases
1	1	CAN	124
1	1	USA	563
1	2	CAN	242
1	2	USA	156
1	3	CAN	257
1	3	USA	452
.	.	...	...
12	31	...	...

Month	Day	Location	Cases
1	1	CAN	124
1	1	USA	563
1	2	CAN	242
1	2	USA	156
1	3	CAN	257
1	3	USA	452
.	.	...	...
12	31	...	...

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related

Month	Day	Location	Cases
1	1	CAN	124
1	1	USA	563
1	2	CAN	242
1	2	USA	156
1	3	CAN	257
1	3	USA	452
.	.	...	...
12	31	...	...