I have a problem at hand where I am trying to parse large log files stored in JSON format, and then tabulate the data and output it as another JSON file. Following is the format of the log files that I am parsing:
{
"timestamp": "2012-10-01TO1:00:00.000",
"id": "[email protected]",
"action": "Some_Action"
"responsecode": "1000"
}
The action here is the action that some user performs, and the response code is the result of that action.
The timestamp and id are actually irrelevant for my tabulation, and I am only interested in the action/code fields. There may be tens of thousands of these entries in any given log file, and what I want to do is keep track of all the types of action's, the responsecode and their respective number of occurrences.
Below would be a sample of the output I am looking to generate.
{"actionName": "Some_User_Action",
"responses": [{"code": "1000", "count": "36"},
{"code": "1001", "count": "6"},
{"code": "1002", "count": "3"},
{"code": "1003", "count": "36"},
{"code": "1004", "count": "2"}],
"totalActionCount": "83"}
So basically, for each Action, I want to keep track of all the different responses it generates, and the number of times each occurred. Finally I want to keep track of the total number of responses for that action in total.
Currently, I have created a Java class for the output object in which I plan to store the output data. I am also a little bit confused with the format I should be storing the array of responses and their respective count numbers. The total number of response code types varies depending on the Action as well.
Based upon my research it seems that I will be needing to make use of JSON parsing using a Streaming API. The reason for using Streaming API is mainly due to the amount of memory overhead using a non-streaming API would need, which is likely not possible with the size of these log files. I am currently considering using Jackson or GSON, but I am unable to find any concrete examples or tutorials to get me started. Does anyone know of a good example that I could study or have any hints on how I go about solving this problem? Thanks you!
EDIT: My class definition.
public class Action {
public static class Response {
private int _resultCode;
private int _count = 0;
public Response() {}
public int getResultCode() { return _resultCode; }
public int getCount() { return _count; }
public void setResultCode(int rc) { _resultCode = rc; }
public void setCount(int c) { _count = c; }
}
private List<Response> responses = new ArrayList<Response>();
private String _name;
// I've left out the getters/setters and helper functions that I will add in after.
}
If I am using Jackson, and want to eventually be able to serialize this object easily back into JSON, are there any suggestions with regards to how I define this class? At the moment I am creating another ArrayList of this Action type in my main() method using: List actions = new ArrayList(); Is using HashMaps or other alternatives a better option? Also, will it allow me to easily serialize it to JSON afterwards using Jackson?