2

I am trying to connect google's geocode api and github api to parse user's location and create a list out of it.

The array (list) I want to create is like this:

location, lat, lon, count
San Francisco, x, y, 4
Mumbai, x1, y1, 5

Where location, lat and lon is parsed from Google geocode, count is the occurrence of that location. Eevery time a new location is added: if it exists in the list the count is incremented otherwise it is appended to the array(list) with location, lat, lon and the count should be 1.

Another example:

location, lat, lon, count
Miami x2, y2, 1 #first occurrence
San Francisco, x, y, 4 #occurred 4 times already
Mumbai, x1, y1, 5 #occurred 5 times already
Cairo, x3, y3, 1 #first occurrence

I can already get the user's location from github and can get the geocoded data from google. I just need to create this array in python which I'm struggling with.

Can anyone help me? thanks.

3
  • I'd suggest using a dictionary (dict) instead. Commented Apr 23, 2013 at 14:42
  • If you want a list for printing with the csv module, check this answer for a way to do that with a dict. Commented Apr 23, 2013 at 14:50
  • 1
    Is lat/long directly correlated to the location, e.g., will all San Francisco locations have the same lat/long? If not, you're going to be requiring additional structures to keep that data intact, as well. Commented Apr 23, 2013 at 15:02

5 Answers 5

4

With collections.Counter, you could do :

from collections import Counter

# initial values
c=Counter({("Mumbai", 1, 2):5, ("San Francisco", 3,4): 4})

#adding entries
c.update([('Mumbai', 1, 2)])
print c  # Counter({('Mumbai', 1, 2): 6, ('San Francisco', 3, 4): 4})

c.update([('Mumbai', 1, 2), ("San Diego", 5,6)])
print c  #Counter({('Mumbai', 1, 2): 7, ('San Francisco', 3, 4): 4, ('San Diego', 5, 6): 1})
Sign up to request clarification or add additional context in comments.

2 Comments

For a single entry .update() is rather verbose. You can also just add directly to the counter: c['Mumbai', 1, 2] += 1.
Thanks Martijn, this is much cleaner !
2

This would be better stored as a dictionary, indexed by city name. You could store it as two dictionaries, one dictionary of tuples for latitude/longitude (since lat/long never changes):

lat_long_dict = {}
lat_long_dict["San Francisco"] = (x, y)
lat_long_dict["Mumbai"] = (x1, y1)

And a collections.defaultdict for the count, so that it always starts at 0:

import collections
city_counts = collections.defaultdict(int)

city_counts["San Francisco"] += 1
city_counts["Mumbai"] += 1
city_counts["San Francisco"] += 1
# city counts would be
# defaultdict(<type 'int'>, {'San Francisco': 2, 'Mumbai': 1})

6 Comments

And how would I add lat and lon to this dict?
Perhaps I'm doing something wrong. my output is test: {u'San Francisco, CA, USA': '-122.4194155, 37.7749295'} - defaultdict(<type 'int'>, {u'San Francisco, CA, USA': 1}); #count should be 4.
It is better to include lat and long in the key (to differentiate Paris, France and Paris, Texas...), so one should rather use a tuple (city, lat, long) as key
A collections.Counter would be better than a defaultdict, it's specifically designed for... counting! :-) Also, what @ThierryLathuille said.
@ThierryLathuille I like what you are saying but I just don't know how to do it in python. Could you direct me to a url or update your answer? Thanks.
|
1

Python has a pre-baked class specifically for counting occurences of things: its called collections.Counter. If you can generate an iterator that gives successive tuples (city, lat, lon) from your input data (perhaps with a generator expression), simply passing that into Counter will directly give you what you're looking for. eg,

>>> locations = [('Miami', 1, 1), ('San Francisco', 2, 2), ('Mumbai', 3, 3), ('Miami', 1, 1), ('Miami', 1, 1)]
>>> Counter(locations)
Counter({('Miami', 1, 1): 3, ('San Francisco', 2, 2): 1, ('Mumbai', 3, 3): 1})

If you need to be able to add more locations as the program runs instead of batching them, put the relevant tuples into that Counter's update method.

Comments

1

This is sort of an amalgamation of all the other recommended ideas:

from collections import defaultdict

inputdata = [('Miami', 'x2', 'y2'),
             ('San Francisco', 'x', 'y'),
             ('San Francisco', 'x4', 'y4'),
             ('Mumbai', 'x1', 'y1'),
             ('Cairo', 'x3', 'y3')]

counts, coords = defaultdict(int), defaultdict(list)

for location, lat, lon in inputdata:
    coords[location].append((lat,lon))
    counts[location] += 1

print counts, coords

This uses defaultdict, which, as you can see allows for an easy way to both:

  1. count the number of occurrences by city
  2. keep lat/lon pairs intact

RETURNS:

defaultdict(<type 'int'>, {'Miami': 1, 'San Francisco': 2, 'Cairo': 1, 'Mumbai': 1}) 
defaultdict(<type 'list'>, {'Miami': [('x2', 'y2')], 'San Francisco': [('x', 'y'), ('x4', 'y4')], 'Cairo': [('x3', 'y3')], 'Mumbai': [('x1', 'y1')]})

This answer makes an (unverified) assumption that the granularity of your lat/lon pairs are unlikely to repeat, but that in fact you're only interested in making counts-by-city.

Comments

0

How about using a python dict? You can read about them here

http://docs.python.org/2/tutorial/datastructures.html#dictionaries

Here is a sample implementation:

// Create an empty dictionary.
dat = {}

if dat.has_key(location):
    dat[location] = dat[location] + 1
else:
    dat[location] = 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.