Training
- One folder for training images
- Images must have unique name or ID
- _0001.tif --> name/ID: 0001; img_5.png --> name/ID: img5, ...
- One folder for segmentation masks
- Corresponding masks must start with name or ID + a mask suffix__
- _0001 -> 0001_mask.png (mask_suffix = "mask.png")
- _0001 -> 0001.png (masksuffix = ".png")
- mask suffix is inferred automatically
- Corresponding masks must start with name or ID + a mask suffix__
Examplary structure:
- [folder] images
- [file] 0001.tif
- [file] 0002.tif
- [folder] masks
- [file] 0001_mask.png
- [file] 0002_mask.png
Prediction
- One folder for training images
- Images must have unique name or ID
- _0001.tif --> name/ID: 0001; img_5.png --> name/ID: img5, ...
- Images must have unique name or ID
- One folder containing trained models (ensemble)
- Ensemble folder and models will be created during Training__
- Do not change the naming of the models
- If you want to train different ensembles, simply rename the ensemble folder
- Ensemble folder and models will be created during Training__
Examplary structure:
- [folder] images
- [file] 0001.tif
- [file] 0002.tif
- [folder] ensemble
- [file] unext50_deepflash2_model-1.pth
- [file] unext50_deepflash2_model-2.pth
Train-validation-split
The train-validation-split is defined as _k-fold cross validation_ with n_splits
n_splits
is the minimum of: (number of files in dataset,max_splits
(default:5))- By default, the number of models per ensemble is limited to
n_splits
Example for a dataset containing 15 images
model_1
is trained on 12 images (3 validation images)model_2
is trained on 12 images (3 different validation images)- ...
model_5
is trained on 12 images (3 different validation images)
Example for a dataset containing 2 images
model_1
is trained on 1 image (1 validation image)model_2
is trained on 1 images (1 different validation image)- Only two models per ensemble
Training Epochs and Iterations
To streamline the training process and allow an easier comparison across differently sized datasets, we decided to use the number of training iterations instead of epochs to define the lenght of a training cycle.
Some useful definitions (adapted from stackoverflow):
- Epoch: one training pass (forward pass and one backward pass) of all the training examples
- Batch size: the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
- Iteration: One forward pass and one backward pass using [batch size] number of examples.
Example: Your dataset comprises 20 images and you want to train for 1000 iterations given a batch size of 4. The algorithm calculates the minimum of epochs needed to train 1000 iterations):
$Epochs = \frac{iterations}{\frac{\#images}{batch size}} = \frac{1000}{\frac{20}{4}} = 200$
The number of epochs will be ceiled to the next integer.