0

I'm trying to scrap some datas from a website, I can actually get them but they're written in 2 different strings looking like that in my .csv:

aaa
bbb
ccc

and the other:

xxx
yyy
zzz

I'd like to write them following this format:

aaa | xxx
bbb | yyy
ccc | zzz

Here is the code I wrote so far :

# import libraries
import urllib2
from bs4 import BeautifulSoup
import csv  
i =0

# specify the url 
quote_page = 'http://www.alertepollens.org/gardens/garden/1/state/'

# query the website and return the html to the variable 'page'
response = urllib2.urlopen(quote_page)

# parse the html using beautiful soap and store in variable `soup`
soup = BeautifulSoup(response, 'html.parser')
test = soup
with open('allergene.csv', 'w') as csv_file:
    writer = csv.writer(csv_file)

    pollene = (("".join(soup.strings)[65:]).encode('utf-8')).replace(' ','').replace('\n',' ').replace('    ',' ').replace('    ',' ').replace(' ','\n')
    print pollene

    state = (([img['alt'] for img in soup.find_all('img', alt=True)])).
    print state.encode
    polen = ''.join(pollene)
    for item in state:
        writer.writerow([item])
    for item2 in pollene:
        writer.writerow([item2])

One of the main problem is that I have french characters (é, ù, à, etc) and using "strip()" doesn't show these characters correctly.

Do you have any idea how I can do that?

2
  • 4
    Please show the code that produces these CSV outputs. Otherwise, it's not obvious how to help.. Commented Jan 9, 2017 at 1:40
  • @alecxe: Just added it :) Commented Jan 9, 2017 at 2:04

1 Answer 1

1
import csv
with open('a.csv') as a, open('x.csv') as x, open('out.csv', 'w', newline='') as out:
    a_lines = [line.strip()for line in a]
    x_lines = [line.strip()for line in x]
    rows = zip(a_lines, x_lines)
    writer = csv.writer(out, delimiter='|')
    writer.writerows(rows)

out:

aaa|xxx
bbb|yyy
ccc|zzz

a.csv is your first csv file, x.csv is your second csv file, out.csv is the output file.

Sign up to request clarification or add additional context in comments.

1 Comment

I think I wasn't as clear as I thought :/ the | mean in another cell / colomn I've a file with in one cell (and only one cell) aaa bbb and in another one : xxx yyy Instead I'd like to write it like this: (first row) aaa (second colomn) xxx (new row) bbb (second colomn) yyy Don't know if it's clear, if not I can edit my question :) my bad :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.