1

I'm new to python and I'm trying to read all the files in a folder over a certain size and export the data (file path and size) to a .json

What I have so far:

import os       
import json
import sys
import io

testPath = str(sys.argv[1])
testSize = int(sys.argv[2])

try:
    to_unicode = unicode
except NameError:
    to_unicode = str

filesList = []
x = 1
j = "1"
data = {}

for path, subdirs, files in os.walk(testPath):
    for name in files:
        filesList.append(os.path.join(path, name))

for i in filesList:
    fileSize = os.path.getsize(str(i))
    if int(fileSize) >= int(testSize):
        data['unit'] = 'B'
        data['path' + j] = str(i)
        data['size' + j] = str(fileSize)
        x = x + 1
        j = str(x)


with io.open('Files.json', 'w', encoding='utf8') as outfile:
    str_ = json.dumps(data,
                      indent=4, sort_keys=True,
                      separators=(',', ': '), ensure_ascii=False)
    outfile.write(to_unicode(str_))

The problem is that the output is:

{
    "path1": "C:\\Folder\\diager.xml",
    "path2": "C:\\Folder\\diag.xml",
    "path3": "C:\\Folder\\setup.log",
    "path4": "C:\\Folder\\ESD\\log.txt",
    "size1": "1908",
    "size2": "4071",
    "size3": "5822",
    "size4": "788",
    "unit": "B"
}

But it needs to be something like this:

{
"unit": "B",
"files": [{"path":"C:\Folder\file1.txt", "size": "10"}, {"path":"C:\Folder\file2.bin", "size": "400"}]
}

I added the j variable because it would just replace the first value and I would just end up with something like this:

{
    "path": "C:\\Folder\\diager.xml",
    "size": "1908",
    "unit": "B"
}

I have no idea how to proceed... Help?

2 Answers 2

2

You can do something like this:

files = []
for i in filesList:
    fileSize = os.path.getsize(str(i))
    if int(fileSize) >= int(testSize):
        files.append({'path': str(i), 'size': fileSize})

data['unit'] = 'B'
data['files'] = files

This way, you create a list containing all paths and add it to the data dict later.

Sign up to request clarification or add additional context in comments.

Comments

0

Initialize your data dictionary with:

data = {"unit": "B", "files": []}

You can then replace your main loop:

for i in filesList:
    fileSize = os.path.getsize(str(i))
    if int(fileSize) >= int(testSize):
        data['unit'] = 'B'
        data['path' + j] = str(i)
        data['size' + j] = str(fileSize)
        x = x + 1
        j = str(x)

by

for i in filesList:
    fileSize = os.path.getsize(str(i))
    if int(fileSize) >= int(testSize):
        data['files'].append({"path": str(i), "size": str(filesize)})

Note that you no longer need your x and j variables.

Edit: In order to control the order of the fields, you can see this question. In particular, according to this nice answer, if you are using python 3.6, you can import OrderedDict (from collections import OrderedDict) and replace data = {"unit": "B", "files": []} by data = OrderedDict(unit="B", files=[])

2 Comments

In order to control the order of the fields, you can see this question
Works like a charm! Also, I just set sort_keys to False instead of True and now it's not printing alphabetically. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.