I have a large JSON files ~5GB, but instead of being made up of one JSON file it has several concatenated together.
{"created_at":"Mon Jan 13 20:01:57 +0000 2014","id":422820833807970304,"id_str":"422820833807970304"}
{"created_at":"Mon Jan 13 20:01:57 +0000 2014","id":422820837545500672,"id_str":"422820837545500672"}.....
With no new line between the curly brackets }{.
I tried replacing the curly brackets with a newline using sed then reading the file with:
data=[]
for line in open(filename,'r').readline():
data.append(json.loads(line))
But this doesn't work.
How can I read this file relatively quickly?
Any help greatly appreciated
data.append(json.loads(line)); you are loading the entire 5 GB of data to your RAM.sed 's/}{/}\n{/g'