0

I am trying to replace the strings in the Years column of the Dataframe below with just the numbers in the string. For example, I would like to change ZC025YR to 025. My code is as follows:

import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile
from pprint import pprint, pformat

my_url = 'http://www.bankofcanada.ca/stats/results/csv'
data = urllib.urlencode({"lookupPage": "lookup_yield_curve.php",
                         "startRange": "1986-01-01",
                         "searchRange": "all"})
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)
zipdata = result.read()
zipfile = ZipFile(StringIO(zipdata))

df = pd.read_csv(zipfile.open(zipfile.namelist()[0]))

df = pd.melt(df, id_vars=['Date'])

df.rename(columns={'variable': 'Years'}, inplace=True)

The dataframe I currently have looks like this:

              Date     Years          value
0       1986-01-01   ZC025YR             na
1       1986-01-02   ZC025YR   0.0948511020
2       1986-01-03   ZC025YR   0.0972953210
3       1986-01-06   ZC025YR   0.0965403640
.....

However, if I add the code below in order to restructure my dataframe I get the error ValueError: cannot convert float NaN to integer which in the line df['Years'] = df['Years'].str.extract('(\d+)').astype(int) which is strange because when I look at the Year's data in the CSV File I don't see there being any 'NaN' associated with it.

#Converting the strings in this column into just the number of Years
df['Years'] = df['Years'].str.extract('(\d+)').astype(int)
df['Years'] = df.Years/100

Thank You

1 Answer 1

1

Try creating a new function which will convert strings to integer and call that in the Series.apply method as follows -

EDIT: Adding logic to default empty strings to 0 , use a different value if you want to handle empty strings in years colomn differently

import re
def getYear(s):
    x = re.search('(\d+)',s)
    return int(x.groups()[0]) if x is not None else 0 # or however you want to handle it

Then use this function as -

df['Years'] = df['Years'].apply(getYear)
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks. But I get the error AttributeError: 'NoneType' object has no attribute 'groups' in the line df['Years'] = df['Years'].apply(getYear)
Can you update the question with the end part of the csv file, where you think the issue is ?
I've been looking at the CSV File and can't gauge where the issue is as I don't see any NaN's associated with the Year's Column. To see what I mean, the csv file is available here [bankofcanada.ca/stats/results/csv]
there are empty string in the Years column
instead of int try converting to float
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.