ValueError when converting string to integer in Dataframe

Question

I am trying to replace the strings in the Years column of the Dataframe below with just the numbers in the string. For example, I would like to change ZC025YR to 025. My code is as follows:

import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile
from pprint import pprint, pformat

my_url = 'http://www.bankofcanada.ca/stats/results/csv'
data = urllib.urlencode({"lookupPage": "lookup_yield_curve.php",
                         "startRange": "1986-01-01",
                         "searchRange": "all"})
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)
zipdata = result.read()
zipfile = ZipFile(StringIO(zipdata))

df = pd.read_csv(zipfile.open(zipfile.namelist()[0]))

df = pd.melt(df, id_vars=['Date'])

df.rename(columns={'variable': 'Years'}, inplace=True)

The dataframe I currently have looks like this:

              Date     Years          value
0       1986-01-01   ZC025YR             na
1       1986-01-02   ZC025YR   0.0948511020
2       1986-01-03   ZC025YR   0.0972953210
3       1986-01-06   ZC025YR   0.0965403640
.....

However, if I add the code below in order to restructure my dataframe I get the error ValueError: cannot convert float NaN to integer which in the line df['Years'] = df['Years'].str.extract('(\d+)').astype(int) which is strange because when I look at the Year's data in the CSV File I don't see there being any 'NaN' associated with it.

#Converting the strings in this column into just the number of Years
df['Years'] = df['Years'].str.extract('(\d+)').astype(int)
df['Years'] = df.Years/100

Thank You

Anand S Kumar · Accepted Answer · 2015-06-20 04:56:39Z

1

Try creating a new function which will convert strings to integer and call that in the Series.apply method as follows -

EDIT: Adding logic to default empty strings to 0 , use a different value if you want to handle empty strings in years colomn differently

import re
def getYear(s):
    x = re.search('(\d+)',s)
    return int(x.groups()[0]) if x is not None else 0 # or however you want to handle it

Then use this function as -

df['Years'] = df['Years'].apply(getYear)

edited Jun 20, 2015 at 4:56

answered Jun 20, 2015 at 2:18

Anand S Kumar

91.4k18 gold badges196 silver badges179 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user131983 Over a year ago

Thanks. But I get the error AttributeError: 'NoneType' object has no attribute 'groups' in the line df['Years'] = df['Years'].apply(getYear)

Anand S Kumar Over a year ago

Can you update the question with the end part of the csv file, where you think the issue is ?

user131983 Over a year ago

I've been looking at the CSV File and can't gauge where the issue is as I don't see any NaN's associated with the Year's Column. To see what I mean, the csv file is available here [bankofcanada.ca/stats/results/csv]

HYRY Over a year ago

there are empty string in the Years column

Anand S Kumar Over a year ago

instead of int try converting to float

|

Collectives™ on Stack Overflow

ValueError when converting string to integer in Dataframe

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related