0

I'm not sure what I'm missing here but I have 2 zip files that contain json files and I'm just trying to combine the data I extract from the files and combine as one dataframe but my loop keeps giving me separate records. Here is what I have prior to constructing DF. I tried pd.concat but I think my issue is more to do with the way I'm reading the files in the first place.

data = []
for FileZips in glob.glob('*.zip'):
    with zipfile.ZipFile(FileZips, 'r') as myzip:
        for logfile in myzip.namelist():
            with myzip.open(logfile) as f:
                contents = f.readlines()[-2]
                jfile = json.loads(contents)
                print len(jfile)

returns:

40935 
40935

2 Answers 2

2

You can use read_json (assuming it's valid json).

I would also break this up into more functions for readability:

def zip_to_df(zip_file):
    with zipfile.ZipFile(zip_file, 'r') as myzip:
        return pd.concat((log_as_df(loglife, myzip)
                             for logfile in myzip.namelist()),
                         ignore_index=True)

def log_as_df(logfile, myzip):
    with myzip.open(logfile, 'r') as f:
        contents = f.readlines()[-2]
        return pd.read_json(contents)

df = pd.concat(map(zip_to_df, glob.glob('*.zip')), ignore_index=True)

Note: This does more concats, but I think it's worth it for readability, you could do just one concat...

Sign up to request clarification or add additional context in comments.

Comments

2

I was able to get what I need with a small adjustment to my indent!!

dfs = []
for FileZips in glob.glob('*.zip'):
    with zipfile.ZipFile(FileZips, 'r') as myzip:
        for logfile in myzip.namelist():
            with myzip.open(logfile, 'r') as f:
                contents = f.readlines()[-2]
                jfile = json.loads(contents)
                dfs.append(pd.DataFrame(jfile))
                df = pd.concat(dfs, ignore_index=True)
print len(df) 

1 Comment

I think you can bring the concat out of all the indentation. My feeling is that this should be broken into separate functions for readability.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.