1

I have a folder that contains 10000 images, and 3 subfolders , each folder contains different number of images. I want to import a small portion of these images for training, that the limited size i chose manually each time i want to pick a portion of the data. I have already this python code :

train_dir = 'folder/train/' # This folder contains 10.000 images and 3 subfolders , each folder contains different number of images

from tqdm import tqdm
def get_data(folder):
    """
    Load the data and labels from the given folder.
    """
    X = []
    y = []
    for folderName in os.listdir(folder):
        if not folderName.startswith('.'):
            if folderName in   ['Name1']:
                label = 0
            elif folderName in ['Name2']:
                label = 1
            elif folderName in ['Name3']:
                label = 2
            else:
                label = 4
            for image_filename in tqdm(os.listdir(folder + folderName)):
                img_file = cv2.imread(folder + folderName + '/' + image_filename)
                if img_file is not None:
                    img_file = skimage.transform.resize(img_file, (imageSize, imageSize, 1))
                    img_arr = np.asarray(img_file)
                    X.append(img_arr)
                    y.append(label)
    X = np.asarray(X) # Keras only accepts data as numpy arrays 
    y = np.asarray(y)
    return X,y


X_test, y_test= get_data(train_dir)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_test, y_test, test_size=0.2)

i want to specify Size parameter so that i can choose the number of images to import. the number of imported images from each subfolder should be equal

2
  • 1
    It seems like what you need is Keras ImageDataGenerator class with flow_from_directory. keras.io/preprocessing/image/#imagedatagenerator-class Commented Feb 28, 2019 at 13:15
  • is it possible to specify the number of images imported from a folder using ImageDataGenerator ? if so , how ? Commented Mar 3, 2019 at 14:08

1 Answer 1

1

You can read and store every paths from each folder in a separate list and select equal number of them.

folder1_files = []
for root, dirs, files in os.walk('path/folder1', topdown=False):
    for i in files:
        folder1_files.append("path/folder1/"+i)

to select:

train = folder1[:n] + folder2[:n] + folder3[:n]

n - number of images from each folder

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, but i want to upload with the code structure i mentioned above ( there encodings and many things .. ), is it possible to do that ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.