0

I have a huge amount of json files (4000) and I need to check every single one of them for a specific object. My code is like the following:

import os
import json

files = sorted(os.listdir("my files path"))
for f in files:
    if f.endswith(".json"): 
        myFile = open("my path\\" + f)
        myJson = json.load(bayesFile)
        if myJson["something"]["something"]["what im looking for"] == "ACTION"
            #do stuff
        myFile.close()

As you can imagine this is taking a lot of execution time and I was wondering if there is a quicker way...?

2
  • Have you considered multithreading or multiprocessing if you '#do Stuff' code is CPU intensive? Also, your code as shown isn't runnable Commented Feb 24, 2022 at 16:27
  • Tbh I never used multithreading or multiprocessing in python so it didn't cross my mind. The #doStuff is not intensive, just a few operations, it's just a very long <<open file, check file, close file>> Commented Feb 24, 2022 at 16:35

1 Answer 1

1

Here's a multithreaded approach that may help you:

from glob import glob
import json
from concurrent.futures import ThreadPoolExecutor
import os

BASEDIR = 'myDirectory' # the directory containing the json files

def process(filename):
    with open(filename) as infile:
        data = json.load(infile)
        if data.get('foo', '') == 'ACTION':
            pass # do stuff

def main():
    with ThreadPoolExecutor() as executor:
        executor.map(process, glob(os.path.join(BASEDIR, '*.json')))

if __name__ == '__main__':
    main()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.