I am trying to parse a large xml file and print the tags to an output file. I am using minidom, my code is working fine for 30Mb files but for larger ones it is getting memory error. So I used bufferred reading the on file but now I am unable to get the desired output.
XML File
> <File> <TV>Sony</TV> <FOOD>Burger</FOOD> <PHONE>Apple</PHONE> </File>
> <File> <TV>Samsung</TV> <FOOD>Pizza</FOOD> <PHONE>HTC</PHONE> </File>
> <File> <TV>Bravia</TV> <FOOD>Pasta</FOOD> <PHONE>BlackBerry</PHONE> </File>
Desired Output
Sony, Burger, Apple
Samsung, Pizza, HTC
Bravia, Pasta, BlackBerry
When reading with buffer its giving me an output saying :-
Sony, Burger, Apple
Samsung,Piz
Bravia, Pasta, BlackBerry
while 1:
content = File.read(2048)
if not len(content):
break
else:
for lines in StringIO(content):
lines = lines.lstrip(' ')
if lines.startswith("<TV>"):
TV = lines.strip("<TV>")
tvVal = TV.split("</TV>")[0]
#print tvVal
w2.writelines(str(tvVal)+",")
elif lines.startswith("<FOOD>"):
FOOD = lines.strip("<FOOD>")
foodVal = FOOD.split("</FOOD>")[0]
#print foodVal
w2.writelines(str(foodVal)+",")
............................
...........................
I tried with seek() but still I was unable to get the desired output.