I am trying to replace the strings in the Years column of the Dataframe below with just the numbers in the string. For example, I would like to change ZC025YR to 025. My code is as follows:
import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile
from pprint import pprint, pformat
my_url = 'http://www.bankofcanada.ca/stats/results/csv'
data = urllib.urlencode({"lookupPage": "lookup_yield_curve.php",
"startRange": "1986-01-01",
"searchRange": "all"})
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)
zipdata = result.read()
zipfile = ZipFile(StringIO(zipdata))
df = pd.read_csv(zipfile.open(zipfile.namelist()[0]))
df = pd.melt(df, id_vars=['Date'])
df.rename(columns={'variable': 'Years'}, inplace=True)
The dataframe I currently have looks like this:
Date Years value
0 1986-01-01 ZC025YR na
1 1986-01-02 ZC025YR 0.0948511020
2 1986-01-03 ZC025YR 0.0972953210
3 1986-01-06 ZC025YR 0.0965403640
.....
However, if I add the code below in order to restructure my dataframe I get the error ValueError: cannot convert float NaN to integer which in the line df['Years'] = df['Years'].str.extract('(\d+)').astype(int) which is strange because when I look at the Year's data in the CSV File I don't see there being any 'NaN' associated with it.
#Converting the strings in this column into just the number of Years
df['Years'] = df['Years'].str.extract('(\d+)').astype(int)
df['Years'] = df.Years/100
Thank You