2

I have a database that contains 60,000+ rows of college football recruit data. From there, I want to create seperate dataframes where each one contains just one value. This is what a sample of the dataframe looks like:

,Primary Rank,Other Rank,Name,Link,Highschool,Position,Height,weight,Rating,National Rank,Position Rank,State Rank,Team,Class
0,1,,D.J. Williams,https://247sports.com/Player/DJ-Williams-49931,"De La Salle (Concord, CA)",ILB,6-2,235,0.9998,1,1,1,Miami,2000
1,2,,Brock Berlin,https://247sports.com/Player/Brock-Berlin-49926,"Evangel Christian Academy (Shreveport, LA)",PRO,6-2,190,0.9998,2,1,1,Florida,2000
2,3,,Charles Rogers,https://247sports.com/Player/Charles-Rogers-49984,"Saginaw (Saginaw, MI)",WR,6-4,195,0.9988,3,1,1,Michigan State,2000
3,4,,Travis Johnson,https://247sports.com/Player/Travis-Johnson-50043,"Notre Dame (Sherman Oaks, CA)",SDE,6-4,265,0.9982,4,1,2,Florida State,2000
4,5,,Marcus Houston,https://247sports.com/Player/Marcus-Houston-50139,"Thomas Jefferson (Denver, CO)",RB,6-0,208,0.9980,5,1,1,Colorado,2000
5,6,,Kwame Harris,https://247sports.com/Player/Kwame-Harris-49999,"Newark (Newark, DE)",OT,6-7,320,0.9978,6,1,1,Stanford,2000
6,7,,B.J. Johnson,https://247sports.com/Player/BJ-Johnson-50154,"South Grand Prairie (Grand Prairie, TX)",WR,6-1,190,0.9976,7,2,1,Texas,2000
7,8,,Bryant McFadden,https://247sports.com/Player/Bryant-McFadden-50094,"McArthur (Hollywood, FL)",CB,6-1,182,0.9968,8,1,1,Florida State,2000
8,9,,Sam Maldonado,https://247sports.com/Player/Sam-Maldonado-50071,"Harrison (Harrison, NY)",RB,6-2,215,0.9964,9,2,1,Ohio State,2000
9,10,,Mike Munoz,https://247sports.com/Player/Mike-Munoz-50150,"Archbishop Moeller (Cincinnati, OH)",OT,6-7,290,0.9960,10,2,1,Tennessee,2000
10,11,,Willis McGahee,https://247sports.com/Player/Willis-McGahee-50179,"Miami Central (Miami, FL)",RB,6-1,215,0.9948,11,3,2,Miami,2000
11,12,,Antonio Hall,https://247sports.com/Player/Antonio-Hall-50175,"McKinley (Canton, OH)",OT,6-5,295,0.9946,12,3,2,Kentucky,2000
12,13,,Darrell Lee,https://247sports.com/Player/Darrell-Lee-50580,"Kirkwood (Saint Louis, MO)",WDE,6-5,230,0.9940,13,1,1,Florida,2000
13,14,,O.J. Owens,https://247sports.com/Player/OJ-Owens-50176,"North Stanly (New London, NC)",S,6-1,195,0.9932,14,1,1,Tennessee,2000
14,15,,Jeff Smoker,https://247sports.com/Player/Jeff-Smoker-50582,"Manheim Central (Manheim, PA)",PRO,6-3,190,0.9922,15,2,1,Michigan State,2000
15,16,,Marco Cooper,https://247sports.com/Player/Marco-Cooper-50171,"Cass Technical (Detroit, MI)",OLB,6-2,235,0.9918,16,1,2,Ohio State,2000
16,17,,Chance Mock,https://247sports.com/Player/Chance-Mock-50163,"The Woodlands (The Woodlands, TX)",PRO,6-2,190,0.9918,17,3,2,Texas,2000
17,18,,Roy Williams,https://247sports.com/Player/Roy-Williams-55566,"Permian (Odessa, TX)",WR,6-4,202,0.9916,18,3,3,Texas,2000
18,19,,Matt Grootegoed,https://247sports.com/Player/Matt-Grootegoed-50591,"Mater Dei (Santa Ana, CA)",OLB,5-11,205,0.9914,19,2,3,USC,2000
19,20,,Yohance Buchanan,https://247sports.com/Player/Yohance-Buchanan-50182,"Douglass (Atlanta, GA)",S,6-1,210,0.9912,20,2,1,Florida State,2000
20,21,,Mac Tyler,https://247sports.com/Player/Mac-Tyler-50572,"Jess Lanier (Hueytown, AL)",DT,6-6,320,0.9912,21,1,1,Alabama,2000
21,22,,Jason Respert,https://247sports.com/Player/Jason-Respert-55623,"Northside (Warner Robins, GA)",OC,6-3,300,0.9902,22,1,2,Tennessee,2000
22,23,,Casey Clausen,https://247sports.com/Player/Casey-Clausen-50183,"Bishop Alemany (Mission Hills, CA)",PRO,6-4,215,0.9896,23,4,4,Tennessee,2000
23,24,,Albert Means,https://247sports.com/Player/Albert-Means-55968,"Trezevant (Memphis, TN)",SDE,6-6,310,0.9890,24,2,1,Alabama,2000
24,25,,Albert Hollis,https://247sports.com/Player/Albert-Hollis-55958,"Christian Brothers (Sacramento, CA)",RB,6-0,190,0.9890,25,4,5,Georgia,2000
25,26,,Eric Moore,https://247sports.com/Player/Eric-Moore-55973,"Pahokee (Pahokee, FL)",OLB,6-4,226,0.9884,26,3,3,Florida State,2000
26,27,,Willie Dixon,https://247sports.com/Player/Willie-Dixon-55626,"Stockton Christian School (Stockton, CA)",WR,5-11,182,0.9884,27,4,6,Miami,2000
27,28,,Cory Bailey,https://247sports.com/Player/Cory-Bailey-50586,"American (Hialeah, FL)",S,5-10,175,0.9880,28,3,4,Florida,2000
28,29,,Sean Young,https://247sports.com/Player/Sean-Young-55972,"Northwest Whitfield County (Tunnel Hill, GA)",OG,6-6,293,0.9878,29,1,3,Tennessee,2000
29,30,,Johnnie Morant,https://247sports.com/Player/Johnnie-Morant-60412,"Parsippany Hills (Morris Plains, NJ)",WR,6-5,225,0.9871,30,5,1,Syracuse,2000
30,31,,Wes Sims,https://247sports.com/Player/Wes-Sims-60243,"Weatherford (Weatherford, OK)",OG,6-5,310,0.9869,31,2,1,Oklahoma,2000
31,33,,Jason Campbell,https://247sports.com/Player/Jason-Campbell-55976,"Taylorsville (Taylorsville, MS)",PRO,6-5,190,0.9853,33,5,1,Auburn,2000
32,34,,Antwan Odom,https://247sports.com/Player/Antwan-Odom-50168,"Alma Bryant (Irvington, AL)",SDE,6-7,260,0.9851,34,3,2,Alabama,2000
33,35,,Sloan Thomas,https://247sports.com/Player/Sloan-Thomas-55630,"Klein (Spring, TX)",WR,6-2,188,0.9847,35,6,5,Texas,2000
34,36,,Raymond Mann,https://247sports.com/Player/Raymond-Mann-60804,"Hampton (Hampton, VA)",ILB,6-1,233,0.9847,36,2,1,Virginia,2000
35,37,,Alphonso Townsend,https://247sports.com/Player/Alphonso-Townsend-55975,"Lima Central Catholic (Lima, OH)",DT,6-6,280,0.9847,37,2,3,Ohio State,2000
36,38,,Greg Jones,https://247sports.com/Player/Greg-Jones-50158,"Battery Creek (Beaufort, SC)",RB,6-2,245,0.9837,38,6,1,Florida State,2000
37,39,,Paul Mociler,https://247sports.com/Player/Paul-Mociler-60319,"St. John Bosco (Bellflower, CA)",OG,6-5,300,0.9833,39,3,7,UCLA,2000
38,40,,Chris Septak,https://247sports.com/Player/Chris-Septak-57555,"Millard West (Omaha, NE)",TE,6-3,245,0.9833,40,1,1,Nebraska,2000
39,41,,Eric Knott,https://247sports.com/Player/Eric-Knott-60823,"Henry Ford II (Sterling Heights, MI)",TE,6-4,235,0.9831,41,2,3,Michigan State,2000
40,42,,Harold James,https://247sports.com/Player/Harold-James-57524,"Osceola (Osceola, AR)",S,6-1,220,0.9827,42,4,1,Alabama,2000

For example, if I don't use a for loop, this line of code is what I use if I just want to create one dataframe:

recruits2022 = recruits_final[recruits_final['Class'] == 2022]

However, I want to have a named dataframe for each recruiting class.

In other words, recruits2000 would be a dataframe for all rows that have a class value equal to 2000, recruits2001 would be a dataframe for all rows that have a class value to 2001, and so forth.

This is what I tried recently, but have no luck saving the dataframe outside of the for loop.

databases = ['recruits2000', 'recruits2001', 'recruits2002', 'recruits2003', 'recruits2004',
             'recruits2005', 'recruits2006', 'recruits2007', 'recruits2008', 'recruits2009',
             'recruits2010', 'recruits2011', 'recruits2012', 'recruits2013', 'recruits2014',
             'recruits2015', 'recruits2016', 'recruits2017', 'recruits2018', 'recruits2019',
             'recruits2020', 'recruits2021', 'recruits2022', 'recruits2023']

for i in range(len(databases)):  
 
  year = pd.to_numeric(databases[i][-4:], errors = 'coerce')
  db = recruits_final[recruits_final['Class'] == year] 
  db.name = databases[i]
  
  print(db)
  print(db.name)
  print(year)

recruits2023

I would get this error instead of what I wanted

NameError                                 Traceback (most recent call last)
<ipython-input-49-7cb5d12ab92f> in <module>()
     29 
     30 # print(db.name)
---> 31 recruits2023
     32 
     33 

NameError: name 'recruits2023' is not defined

Is there something that I am missing to get this for loop to work? Any assistance is truly appreciated. Thanks in advance.

2
  • it's better to create a dict of dfs. dynamically creating variables is highly discouraged (see creating variable variables and dynamically creating variables) Commented Jul 4, 2021 at 2:40
  • Avoid flooding your global environment with many objects. Instead use a list or indexed dict of many elements which provides better management and organization. Commented Jul 4, 2021 at 12:06

2 Answers 2

4

List use a dictionary of dataframes using groupby:

dict_dfs = dict(tuple(df.groupby('Class')))

Access you individual dataframes using

dict_dfs[2022]
Sign up to request clarification or add additional context in comments.

1 Comment

This one did the trick for me. I was worried that doing this method would affect how it could work when asking for each individual dict_dfs to graph, but it ended up working out. Thank you so much!
2

You override variable db at each iteration and recruits2023 is not a variable so you can't use it like that:

You can use a dict to store your data:

recruits = {}
for year in recruits_final['Class'].unique():
    recruits[year] = recruits_final[recruits_final['Class'] == year]
>>> recruits[2000]
    Primary Rank  Other Rank               Name                                               Link  ... Position Rank State Rank            Team  Class
0              1         NaN      D.J. Williams     https://247sports.com/Player/DJ-Williams-49931  ...             1          1           Miami   2000
1              2         NaN       Brock Berlin    https://247sports.com/Player/Brock-Berlin-49926  ...             1          1         Florida   2000
2              3         NaN     Charles Rogers  https://247sports.com/Player/Charles-Rogers-49984  ...             1          1  Michigan State   2000
3              4         NaN     Travis Johnson  https://247sports.com/Player/Travis-Johnson-50043  ...             1          2   Florida State   2000
...
38            40         NaN       Chris Septak    https://247sports.com/Player/Chris-Septak-57555  ...             1          1        Nebraska   2000
39            41         NaN         Eric Knott      https://247sports.com/Player/Eric-Knott-60823  ...             2          3  Michigan State   2000
40            42         NaN       Harold James    https://247sports.com/Player/Harold-James-57524  ...             4          1         Alabama   2000

>>> recruits.keys()
dict_keys([2000])

1 Comment

or as a dict comprehension: db = {year: recruits_final[recruits_final['Class'] == year] for year in recruits_final['Class'].unique()}

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.