Python : Help for parsing website and extracting data into csv file

Question

This is my first question here, feel free to tell me if there is something I'm doing wrong. I'm trying to extract "Title" and "Show time" from a movie website for some sociological study.

My python code is working however it takes only the first index of my list named "horaire" when I would like to include them all in my csv file.

My issue is that I dont know in advance how many index will this list contain.

Find my script below :

from urllib import urlopen
from bs4 import BeautifulSoup
import csv
import sys

url = "http://www.allocine.fr/seance/salle_gen_csalle=C0116.html"
html = urlopen(url).read()
soup = BeautifulSoup(html, "lxml")
reload(sys)
sys.setdefaultencoding('utf8')

with open('test2306.csv', 'wb') as csvfile:
    cinemaWriter = csv.writer(csvfile, quoting=csv.QUOTE_ALL)

    for films in soup.find_all('div',
                               {'class': 'card entity-card entity-card-list movie-card-theater cf hred'}):
        horaire = films.find_all('span',
                               {'class': 'showtimes-hour-item-value'})
        titres = films.find_all('a',
                               {'class': 'meta-title-link'})
        cinemaWriter.writerow([horaire[0:].text.strip(),
                                titres[0:].text.strip()])

Thank you for your help <3 !

Jack

Jonas · Accepted Answer · 2020-06-23 19:58:36Z

1

[EDIT] to get all entries of horaire:

You can try this:

with open('test2306.csv', 'w') as csvfile:  ## 'w' instead of 'wb'
    cinemaWriter = csv.writer(csvfile, quoting=csv.QUOTE_ALL)

    for films in soup.find_all('div',
                               {'class': 'card entity-card entity-card-list movie-card-theater cf hred'}):
        horaire = films.find_all('span',
                               {'class': 'showtimes-hour-item-value'})
        titres = films.find_all('a',
                               {'class': 'meta-title-link'})
        
        horaire = ','.join([i.text for i in horaire])

        cinemaWriter.writerow([horaire, titres[0].text])

edited Jun 23, 2020 at 19:58

answered Jun 23, 2020 at 17:51

Jonas

1,7691 gold badge12 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Luc Camisard Over a year ago

Hi ! Thanks for your help ! But this does not solve my issue. There are several value in my list horaire and I would like to write them all in my csv file. This only write the first value of the list.

Jonas Over a year ago

Happy to help, sorry that I didnt recognize this at the first time. Have a look at my updated code. Now it writes all the entries of horaire to the csv-file.

Luc Camisard Over a year ago

Thank you for updating, I'll try this tomorrow !

Collectives™ on Stack Overflow

Python : Help for parsing website and extracting data into csv file

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related