Datasets
You have to use torch.utils.data.Dataset structure to define it.
Here is how you can do it in plain pytorch (I'm using pillow to load the images and torchvision to transform them to torch.Tensor objects):
import torch
import torchvision
from PIL import Image
class MyDataset(torch.utils.data.Dataset):
def __init__(self, dataframe):
self.dataframe = dataframe
def __len__(self):
return len(self.dataframe)
def __getitem__(self, index):
row = self.dataframe.iloc[index]
return (
torchvision.transforms.functional.to_tensor(Image.open(row["Path"])),
row["Score"],
)
dataset = MyDataset(dataframe)
Alternatively, you can use torchdata (disclaimer: shameless self-promotion as I'm the author...) which allows you to decouple Path and Scores like this:
import torchvision
from PIL import Image
import torchdata
class ImageDataset(torchdata.datasets.FilesDataset):
def __getitem__(self, index):
return Image.open(self.files[index])
class Labels(torchdata.Dataset):
def __init__(self, scores):
super().__init__()
self.scores = scores
def __len__(self):
return len(self.scores)
def __getitem__(self, index):
return self.scores[index]
# to_numpy for convenience
# I assume all your images are in /folder and have *.jpg extension
dataset = ImageDataset.from_folder("/folder", regex="*.jpg").map(
torchvision.transforms.ToTensor()
) | Labels(dataframe["Score"].to_numpy())
(or you could implement it just like in regular pytorch but inheriting from torchdata.Dataset and calling super().__init__() in the constructor).
torchdata allows you to cache your images easily or apply some other transformations via .map as shown there, check github repository for more info or ask in the comment.
DataLoader
Either way you choose you should wrap your dataset in torch.utils.data.DataLoader to create batches and iterate over them, like this:
dataloader = torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=True)
for images, scores in dataloader:
# Rest of your code to train neural network or smth
...
Do with those images and scores what you want in the loop.