The error cause by 'Food, Beverage & Tobacco' which has extra comma that cause pandas unable to read the csv file. it cause error
Error tokenizing data. C error: Expected 3 fields in line 29, saw 4
How can I elegantly eliminate extra comma in the csv file for 'GICS industry group'(including condition beside the comma is behind Food)?
Here is my code:
#!/usr/bin/env python2.7
print "hello from python 2"
import pandas as pd
from lxml import html
import requests
import urllib2
import os
url = 'http://www.asx.com.au/asx/research/ASXListedCompanies.csv'
response = urllib2.urlopen(url)
html = response.read()
#html = html.replace('"','')
with open('asxtest.csv', 'wb') as f:
f.write(html)
with open("asxtest.csv",'r') as f:
with open("asx.csv",'w') as f1:
f.next()#skip header line
f.next()#skip 2nd line
for line in f:
if line.count(',')>2:
line[2] = 'Food Beverage & Tobacco'
f1.write(line)
os.remove('asxtest.csv')
df_api = pd.read_csv('asx.csv')
df_api.rename(columns={'Company name': 'Company', 'ASX code': 'Stock','GICS industry group': 'Industry'}, inplace=True)
df_api = pd.read_csv(url, skiprows=1, names=['Company', 'Stock', 'Industry'])