0

I have been working on a project to scrape the schedule from an amateur hockey site and export it to csv in a format that is acceptable to upload into the Sports Engine application. I have managed to get the data I want in a text only format but now need to figure out how to convert it so it can be exported to csv.

Here is a sample output of the script, shortened for brevity.

AL1602 · Nov 6 · Atom A League · FVC Flight 3FINALMSA Arena · Abbotsford, BCLANGLEY MHA ATOM A4 EAGLES2 - 6ABBOTSFORD ATOM A2 HAWKS AL1607 · Nov 10 · Atom A League · FVC Flight 3FINALMission Leisure Centre · North · Mission, BCTime change due to ice conflict CSABBOTSFORD ATOM A2 HAWKS5 - 4MISSION MHA ATOM A2

Here is a sample output of the script but just using print(tables) to show the formatting and not just printing out the text.

[<tr class="gamelist-row"><td class="game-details"><div class="game-meta text-muted">AL1602 · Nov 6<a class="text-muted" href="/leagues/786?scheduleId=1265&amp;groupId=5" title="Atom A League · FVC Flight 3"> · Atom A League · FVC Flight 3</a></div><div class="game-time">FINAL</div><div class="game-arena">MSA Arena<span class="text-muted"> · Abbotsford, BC</span></div></td><td><div class="game-matchup"><a class="team-link" href="/teams/4688?scheduleId=1265&amp;groupId=5"><div class="d-flex flex-row" style="min-width: 125px;"><div class="pr-2"><div alt="LANGLEY MHA ATOM A4 EAGLES" class="team-logo" style='background-image: url("https://s3-ca-central-1.amazonaws.com/hisports-logos/1537488764672.png");'></div></div><div class="d-flex flex-fill flex-column justify-content-center"><span class="team-name text-uppercase">LANGLEY MHA ATOM A4 EAGLES</span></div></div></a><div class="game-result score"><div class="result result-loss">2</div><span class="text-muted"> - </span><div class="result result-win">6</div></div><a class="team-link" href="/teams/4326?scheduleId=1265&amp;groupId=5"><div class="d-flex flex-row flex-row-reverse" style="min-width: 125px;"><div class="pl-2"><div alt="ABBOTSFORD ATOM A2 HAWKS" class="team-logo" style='background-image: url("https://s3-ca-central-1.amazonaws.com/hisports-logos/1538567502609.jpg");'></div></div><div class="d-flex flex-fill flex-column justify-content-center"><span class="team-name text-uppercase text-right">ABBOTSFORD ATOM A2 HAWKS</span></div></div></a></div></td></tr>, <tr class="gamelist-row"><td class="game-details"><div class="game-meta text-muted">AL1607

Below is the script.

from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

#launch url
url = "https://games.pcaha.ca/teams/4326"

#create a new Firefox session
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)

#After opening the url above, Selenium finds the table with the schedule
games = driver.find_elements_by_id("table-responsive")

#Selenium hands the page source to Beautiful Soup
soupsource=BeautifulSoup(driver.page_source, 'lxml')
soupsource.prettify()

#Beautiful Soup grabs the class gamelist-row
tables = soupsource.find_all("tr", class_="gamelist-row")

# prints out the text only
for x in tables:
    print(x.text)


5
  • Can you show sample expected result? Commented Feb 5, 2020 at 6:11
  • Can you post the output of the script ? Commented Feb 5, 2020 at 6:13
  • @Marco Here is a sample output of the script, shortened for brevity. AL1602 · Nov 6 · Atom A League · FVC Flight 3FINALMSA Arena · Abbotsford, BCLANGLEY MHA ATOM A4 EAGLES2 - 6ABBOTSFORD ATOM A2 HAWKS AL1607 · Nov 10 · Atom A League · FVC Flight 3FINALMission Leisure Centre · North · Mission, BCTime change due to ice conflict CSABBOTSFORD ATOM A2 HAWKS5 - 4MISSION MHA ATOM A2 Commented Feb 5, 2020 at 16:47
  • @wedge22 it is not just for me, you can put it in the post.. then looking from here are fields separated by tabs and lines separated by CR ? Commented Feb 5, 2020 at 16:55
  • @Marco I have added more information to my original post that should answer your questions. Commented Feb 5, 2020 at 17:10

1 Answer 1

1
import csv

with open('file.csv', mode='w') as csv_file:
fieldnames = ['header1', 'header2', 'header3']
     writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

     writer.writeheader()
     writer.writerow({'field1': 'John Smith', 'field2': 'Accounting','field3': 'November'})

Try this little snippet out for writing to a csv file. Modify it to fit your needs!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.