1

I have csv file having some address data mostly in Finnish language. I need to read that file and getting some geocode information of these address. But It doesn't work for Finnish alphabet and says it cant read those! Can anybody please help me out of this?

import urllib,urllib2,time

addr_file = 'address.csv'
out_file = 'addresses_geocoded.csv'
out_file_failed = 'failed.csv'
sleep_time = 2
root_url = "http://maps.google.com/maps/geo?"
gkey = "asfasdfasdfasdf"       # not an actual value
return_codes = {'200':'SUCCESS',
                         '400':'BAD REQUEST',
                         '500':'SERVER ERROR',
                         '601':'MISSING QUERY',
                         '602':'UNKOWN ADDRESS',
                         '603':'UNAVAILABLE ADDRESS',
                         '604':'UNKOWN DIRECTIONS',
                         '610':'BAD KEY',
                         '620':'TOO MANY QUERIES'

                         }
def geocode_for_musiquitous(addr_file,out_fmt='csv'):
        #encode our dictionary of url parameters
        values = {'q' : addr_file, 'output':out_fmt, 'key':gkey}
        data = urllib.urlencode(values)
        #set up our request
        url = root_url+data
        req = urllib2.Request(url)
        #make request and read response
        response = urllib2.urlopen(req)
        geodat = response.read().split(',')
        response.close()

        # this section is just handle the data returned from google
        code = return_codes[geodat[0]]
        if code == 'SUCCESS':
                code,precision,lat,lng = geodat
                return {'code':code,'precision':precision,'lat':lat,'lng':lng}
        else:
                return {'code':code}

def main():
#open  i/o files
        outf = open(out_file,'w')
        outf_failed = open(out_file_failed,'w')
        inf = open(addr_file,'r')
        for address in inf:
            #get latitude and longitude of address
                data = geocode_for_musiquitous(address)


            #output results and log to file



                if len(data)>1:
                        print "Latitude and Longitude of "+address+":"
                        print "\tLatitude:",data['lat']
                        print "\tLongitude:",data['lng']
                        outf.write(address.strip()+data['lat']+','+data['lng']+'\n')
                        outf.flush()
                else:
                        print "Geocoding of '"+addr_file+"' failed with error code "+data['code']
                        outf_failed.write(address)


                        outf_failed.flush()

                time.sleep(sleep_time)

                #clean up
        inf.close()
        outf.close()
        outf_failed.close()

if __name__ == "__main__":
        main()
4
  • preview exists for a reason, fix your formatting! Commented Feb 9, 2010 at 12:29
  • @rahman: formatting was fixed, please don't break it again. Commented Feb 9, 2010 at 12:31
  • sorry..I was confused when editing! Commented Feb 9, 2010 at 12:36
  • 2
    "It says it cant read those." That is, I assure you, not what it says. It is much easier to debug if you can tell us exactly what python says. That is, paste in the error message and stack trace. That will tell us exactly what line the problem is on, so we don't have to wade through your entire program to find it. Commented Feb 9, 2010 at 12:41

4 Answers 4

1

The argument of urllib.url should be UTF-8 encoded beforehand:

addr_file = addr_file.encode("utf-8")
values = {'q' : addr_file, 'output':out_fmt, 'key':gkey}
data = urllib.urlencode(values)

And make sure you open the CSV file with the correct encoding (might be "windows-1252" or "iso-8859-1"):

inf = codecs.open(addr_file, 'r', 'iso-8859-1')
Sign up to request clarification or add additional context in comments.

Comments

0

I don't know Python, but I'm pretty sure this is an encoding issue.

Make sure your address file is UTF-8 encoded and that urlencode() function you use can deal with UTF-8 characters (the latter shouldn't be a problem though, as Python can handle UTF-8 natively as far as I know).

Comments

0

Use the codecs module.

codecs.open():

codecs.open(filename, mode[, encoding[, errors[, buffering]]])

Open an encoded file using the given mode and return a wrapped version providing transparent encoding/decoding. The default file mode is 'r' meaning to open the file in read mode.

You can use wrapped file objects to read encoded files into unicode strings.

Comments

0

You need to open file using the correct encoding using the codecs module. The correct encoding for Finnish is probably ISO-8859-1

inf = codecs.open(addr_file,'r', 'iso-8859-1')

If this is not the correct encoding for your file you need to find out what the correct encoding for you file is then check whether a codec for it is available like below:

import codecs
codec = codecs.lookup("iso-8859-1'")
print codec.name

If codecs.lookup() returns a codec object for the encoding you a looking for then it is available and can be used in codecs.open().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.