I still don't fully understand how map/reduce works, so I thought I'd give an example of a problem I need to solve, and hopefully the answer will help me understand the concept.
I'm tracking page views using a document structure similar to this:
{
"timestamp" : 1299990045,
"visitor" : {
"region" : {
"country_code" : "US",
},
"browser" : {
"name" : "IE",
"version" : "8.0",
}
},
"referer" : {
"host" : "www.google.com",
"path" : "/",
"query" : "q=map%2Freduce"
}
}
I store a single document for each page view. Because I get about 15 million page views a day, I'd like to aggregate these results each night, save the aggregate results for that day, and then drop the collection to begin storing page views again. I want the output of the map/reduce to look like this:
{
"day" : "Sun Mar 13 2011 00:00:00 GMT-0400 (EDT)",
"regions" : {
"US" : 235,
"CA" : 212,
"JP" : 121
},
"browsers" : {
"IE" : 145,
"Firefox" : 245,
"Chrome" : 95,
"Other" : 120
},
"referers" : {
"www.google.com" : 24,
"yahoo.com" 56
}
}
I really don't know where to begin doing this kind of thing. Any help would be appreciated.