Write Headers Once in Python CSV Writer Loop

Question

Below is a scraper that loops through two websites, scrapes a team's roster information, puts the information into an array, and exports the arrays into a CSV file. Everything works great, but the only problem is the writerow headers repeat in the csv file every time the scraper moves on to the second website. Is it possible to adjust the CSV portion of the code to have the headers only appear once when the scraper is looping through multiple websites? Thanks in advance!

import requests
import csv
from bs4 import BeautifulSoup

team_list={'yankees','redsox'}

for team in team_list:
    page = requests.get('http://m.{}.mlb.com/roster/'.format(team))
    soup = BeautifulSoup(page.text, 'html.parser')

    soup.find(class_='nav-tabset-container').decompose()
    soup.find(class_='column secondary span-5 right').decompose()

    roster = soup.find(class_='layout layout-roster')
    names = [n.contents[0] for n in roster.find_all('a')]
    ids = [n['href'].split('/')[2] for n in roster.find_all('a')]
    number = [n.contents[0] for n in roster.find_all('td', index='0')]
    handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
    height = [n.contents[0] for n in roster.find_all('td', index='4')]
    weight = [n.contents[0] for n in roster.find_all('td', index='5')]
    DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
    team = [soup.find('meta',property='og:site_name')['content']] * len(names)

    with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
        f = csv.writer(fp)
        f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

Have you tried moving f.writerow above for team in team_list? — OneCricketeer
– OneCricketeer, Commented Jul 15, 2018 at 23:30
Just write the header before the for loop. This means that the for loop should be wrapped within the with context manager. — RoadRunner
– RoadRunner, Commented Jul 16, 2018 at 0:12

Harun ERGUL · Accepted Answer · 2018-07-15 23:42:08Z

3

Using a variable to check if header is added or not may be helpful. If header added it will not add second times

header_added = False
for team in team_list:
    do_some stuff

    with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
        f = csv.writer(fp)
        if not header_added:
            f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
            header_added = True
        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

answered Jul 15, 2018 at 23:42

Harun ERGUL

5,9805 gold badges58 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

archang31 · Accepted Answer · 2018-07-16 00:14:48Z

Another method would be to simply do it before the for loop so you do not have to check if already written.

import requests
import csv
from bs4 import BeautifulSoup

team_list={'yankees','redsox'}

with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
    f = csv.writer(fp)
    f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])

for team in team_list:
    do_your_bs4_and_parsing_stuff

    with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
        f = csv.writer(fp)
        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

You can also open the document just once instead of three times as well

import requests
import csv
from bs4 import BeautifulSoup

team_list={'yankees','redsox'}

with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
    f = csv.writer(fp)
    f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])

    for team in team_list:
        do_your_bs4_and_parsing_stuff

        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

RoadRunner · Accepted Answer · 2018-07-16 00:21:45Z

1

Just write the header before the loop, and have the loop within the with context manager:

import requests
import csv
from bs4 import BeautifulSoup

team_list = {'yankees', 'redsox'}

headers = ['Name', 'ID', 'Number', 'Hand', 'Height', 'Weight', 'DOB', 'Team']

# 1. wrap everything in context manager
with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
    f = csv.writer(fp)

    # 2. write headers before anything else
    f.writerow(headers)

    # 3. now process the loop
    for team in team_list:
        # Do everything else...

You could also define your headers similarily to team_list outside the loop, which leads to cleaner code.

edited Jul 16, 2018 at 0:21

answered Jul 16, 2018 at 0:16

RoadRunner

26.4k6 gold badges46 silver badges77 bronze badges

3 Comments

Nate Walker Over a year ago

Thanks for the advice RoadRunner! The code ran but unfortunately, it returned an empty CSV file. Do I have to include the writerow zip line at the end of the for loop?

RoadRunner Over a year ago

@NateWalker Yeah the writerow should be at the end of the loop.

RoadRunner Over a year ago

@NateWalker No worries.

Collectives™ on Stack Overflow

Write Headers Once in Python CSV Writer Loop

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related