0

I actually write a code, which one i am scraping a table from a website, and import to an CSV file. My question is, how can i make a new columns and put the actually date in every cells in this columns (while there are data).

Here is my code:

import urllib2
from bs4 import BeautifulSoup
import unicodecsv as csv
import os
import sys
import io
import time
import datetime

filename=r'output.csv'

resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
header = ['Pénznem', 'Devizanév','Egység','Pénznem_Forintban', 'Dátum']
output.writerow(header)

def make_soup(url):
    thepage = urllib2.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata

def to_2d(l,n):
    return [l[i:i+n] for i in range(0, len(l), n)]

soup=make_soup("https://www.mnb.hu/arfolyamok")

datatable=[]
for record in soup.findAll('tr'):
    for data in record.findAll('td'):
        datatable.append(data.text)
maindatatable = to_2d(datatable, 4)

output.writerows(maindatatable)

resultcsv.close()

print maindatatable

Where i have to put and what i have to change in my date code, to get the correct result?

now = time.strftime('%d-%m-%Y')
ido = to_2d(now, 5)
output.writerows(ido)
2
  • do you intend to "append" data to existing one ? Commented Jun 14, 2017 at 7:37
  • Probabably yes, because i have 4 columns which ones i am scraping from a website (its a table), and i want to make a 5. columns, where i can write the actually date in every rows. Commented Jun 14, 2017 at 7:42

1 Answer 1

1

Add this to the end of your code:

import pandas as pd
df = pd.DataFrame(maindatatable)
now = time.strftime('%d-%m-%Y')
df['date'] = now
df.columns = header
df.to_csv(filename, sep=';', encoding='utf-8', index=False)
Sign up to request clarification or add additional context in comments.

9 Comments

I get this error message: "Traceback (most recent call last): File "C:\Python27\prohardver.py", line 49, in <module> df.to_csv('output2.csv', sep=';', encoding='utf-8', index=False) File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1403, in to_csv formatter.save() File "C:\Python27\lib\site-packages\pandas\io\formats\format.py", line 1586, in save self._save() File "C:\Python27\lib\site-packages\pandas\io\formats\format.py", line 1673,
in _save self._save_header() File "C:\Python27\lib\site-packages\pandas\io\formats\format.py", line 1641, in _save_header writer.writerow(encoded_labels) File "C:\Python27\lib\site-packages\pandas\io\common.py", line 527, in writerow data = data.decode("utf-8") File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte"
Hmm it works locally for me. Try changing encoding='utf-8' to encoding='ISO-8859-1'
It isnt work :( Maybe you work with 3.x? I am using 2.7.13
I'm on 2.7.11. Please also try 'ISO 8859-16' or 'Windows-1250'
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.