1

So I'm using pandas.read_html to try to get a table from a website. For some reason it's not giving me the entire table and it's just getting the header row. How can I fix this?

Code:

import pandas as pd

term_codes = {"fall":"10", "spring":"20", "summer":"30"}

# year must be last number in school year: 2021-2022 so we pick 2022
year = "2022"
department = "CSCI"
term_code = year + term_codes["fall"]
url = "https://courselist.wm.edu/courselist/courseinfo/searchresults?term_code=" + term_code + "&term_subj=" + department + "&attr=0&attr2=0&levl=0&status=0&ptrm=0&search=Search"

def findCourseTable():
    dfs = pd.read_html(url)
    print(dfs[0])
    #df = dfs[1]
    #df.to_csv(r'courses.csv', index=False)

if __name__ == "__main__":
    findCourseTable()

Output:

Empty DataFrame
Columns: [CRN, COURSE ID, CRSE ATTR, TITLE, INSTRUCTOR, CRDT HRS, MEET DAY:TIME, PROJ ENR, CURR ENR, SEATS AVAIL, STATUS]
Index: []

1 Answer 1

3

The page contains malformed HTML code, so use flavor="html5lib" in pd.read_html to read it correctly:

import pandas as pd

term_codes = {"fall": "10", "spring": "20", "summer": "30"}

# year must be last number in school year: 2021-2022 so we pick 2022
year = "2022"
department = "CSCI"
term_code = year + term_codes["fall"]
url = (
    "https://courselist.wm.edu/courselist/courseinfo/searchresults?term_code="
    + term_code
    + "&term_subj="
    + department
    + "&attr=0&attr2=0&levl=0&status=0&ptrm=0&search=Search"
)

df = pd.read_html(url, flavor="html5lib")[0]
print(df)

Prints:

      CRN     COURSE ID  CRSE ATTR                           TITLE                        INSTRUCTOR CRDT HRS  MEET DAY:TIME  PROJ ENR  CURR ENR SEATS AVAIL  STATUS
0   16064   CSCI 100 01  C100, NEW                  Reading@Russia  Willner, Dana; Prokhorova, Elena        4  MWF:1300-1350        10        10          0*  CLOSED
1   14614   CSCI 120 01        NaN  A Career in CS? And Which One?                     Kemper, Peter        1    M:1700-1750        36        20          16    OPEN
2   16325   CSCI 120 02        NEW    Concepts in Computer Science                   Deverick, James        3   TR:0800-0920        36        25          11    OPEN
3   12372   CSCI 140 01   NEW, NQR    Programming for Data Science                 Khargonkar, Arohi        4  MWF:0900-0950        36        24          12    OPEN
4   14620   CSCI 140 02   NEW, NQR    Programming for Data Science                 Khargonkar, Arohi        4  MWF:1100-1150        36        27           9    OPEN
5   13553   CSCI 140 03   NEW, NQR    Programming for Data Science                 Khargonkar, Arohi        4  MWF:1300-1350        36        25          11    OPEN

...and so on.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.