How to solve memory error when loading 1GB large json files in python?

Question

I'm trying to convert json files to csv but getting memory error. Is there any efficient way to fine tune this code to process large json files in python.

def change(row, pastkeys=()):
result = {}
c=0
for key in row:
    c=c+1
    newkey = pastkeys + (key,)
    print key
    val = row[key]
    if isinstance(val, dict):
        result.update(change(val, newkey))
    elif isinstance(val, list):
        result.update(change(dict(zip(range(0, len(val)), val)), newkey))
    else:
        result[newkey] = val
return result
a=open(sys.argv[1],'r')
lines=list(a)
 print lines
out1=open(sys.argv[2],'w')
try:
  data = json.loads(''.join(lines))
  if isinstance(data, dict):
    data = [data]
  except ValueError:
    data = [json.loads(line) for line in lines]
 result = []
 fields = set()
 for row in data:
    hash = change(row)
    fields |= set(hash.keys()
    result.append(hash)
out1=open(sys.argv[2],'w+')
fields = sorted(fields)
out = csv.writer(out1,lineterminator='\n')
out.writerow(['-'.join([str(f) for f in field]) for field in fields])
for row in result:
out.writerow([(row.get(field,'')) for field  in fields ])

a.close()

what is the error? An exception or the proceses killed by OS? — fpilee
– fpilee, Commented May 4, 2016 at 13:24

salomonderossi · Accepted Answer · 2016-05-04 13:23:06Z

5

You could try to use ijson. It is a module that will work with JSON as a stream, rather than a block file. ijson is to JSON what SAX is to XML.

import ijson
for prefix, theType, value in ijson.parse(open(jsonFileName)):
    print prefix, theType, value

answered May 4, 2016 at 13:23

salomonderossi

2,19814 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

fpilee · Accepted Answer · 2016-05-04 13:48:19Z

0

You are loading the entire content of your file in one list (lines) and storing the results in another list (result).

Do not load the entire content of your file in memory unless you need some advantage, like the speed of access (ram vs hdd).

Instead you could process one line at the time, reading it, processing it and appending in your file.

answered May 4, 2016 at 13:48

fpilee

2,1183 gold badges26 silver badges40 bronze badges

Collectives™ on Stack Overflow

How to solve memory error when loading 1GB large json files in python?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related