I need help with improving my script's execution time.
It does what it suppose to do:
- Reads a file line by line
- Matches the line with the content of json file
- Writes both the matching lines with the corresponding information from json file into a new txt file
The problem is with execution time, the file has more than 500,000 lines and the json file contains much more.
How can I optimize this script?
import json
import time
start = time.time()
print start
JsonFile=open('categories.json')
data = json.load(JsonFile)
Annotated_Data={}
FileList = [line.rstrip('\n') for line in open("FilesNamesID.txt")]
for File in FileList:
for key, value in data.items():
if File == key:
Annotated_Data[key]=(value)
with open('Annotated_Files.txt', 'w') as outfile:
json.dump(Annotated_Data, outfile, indent=4)
end = time.time()
print(end - start)
FileList = [line.rstrip('\n') for line in open("FilesNamesID.txt")]I would directly usefor File in open("FilesNamesID.txt"). This avoids creating a 500 thousand lines big list which have to be stored in memory. So only the actual line is loaded into memory.