I have got an assignment consists of questions and large JSON file with objects. JSON file has around 5M objects inside and it has 303MB.
this large file can be downloaded here.
Small preview what is inside:
{ Reviewer:1, Movie:1535440, Grade:4, Date:'2005-08-18'},
{ Reviewer:1, Movie:1426604, Grade:4, Date:'2005-09-01'},
{ Reviewer:1, Movie:1815755, Grade:5, Date:'2004-07-20'},
{ Reviewer:2, Movie:2059652, Grade:4, Date:'2005-09-05'},
{ Reviewer:2, Movie:1666394, Grade:3, Date:'2005-04-19'},
{ Reviewer:2, Movie:1759415, Grade:4, Date:'2005-04-22'},
Each row represents one review. We can find id of reviewer there, then grade he used to review the movie, movie id, and date (in string form).
I need to import this file into my .NET Console app, deserialize it and convert it into the objects so then I can work with them and create some methods, lists of objects etc.
Questions examples:
- with parameter N, what is the number of reviews from reviewer N?
(this should be method with parameter of reviewer's id, one reviewer (person) is able to make multiple reviews of different movies)
- What reviewer(s) had done most reviews?
The problem is, that every time, when I deserialize the objects from the file, only deserialization itself takes around 10 seconds and the requirement is, that each method can take maximum 4 seconds of process. Even if I specify only one field I want to deserialize from the file, it takes too much time.
Do you, please, know some effective ways or some nuGet packages how to convert these data in less than 4 seconds? I tried Newtonsoft.JSON only.
I found one interesting article but I was not successful in implementation of that code because code snippets are not completely described and I was not able to figure out. Here is the link to that article.
I would be thankful for every idea and help.