Skip to content

All configuration options

To make it easy to associate a model configuration with a set of results, zamba accepts a yaml file to define all of the relevant parameters for training or prediction. You can then store the configuration you used with the results in order to easily reproduce it in the future.

In general, we've tried to pick defaults that are reasonable, but it is worth it to familiarize yourself with the options available.

The primary configurations you may want to set are:

  • VideoLoaderConfig: Defines all possible parameters for how videos are loaded when working with videos
  • PredictConfig: Defines all possible parameters for model inference on videos
  • TrainConfig: Defines all possible parameters for model training on videos
  • ImageClassificationPredictConfig: Defines all possible parameters for model inference on images
  • ImageClassificationTrainingConfig: Defines all possible parameters for model training on images

Here's a helpful diagram which shows how everything is related for the video workflows:

Video loading arguments

The VideoLoaderConfig class defines all of the optional parameters that can be specified for how videos are loaded before either inference or training. This includes selecting which frames to use from each video.

All video loading arguments can be specified either in a YAML file or when instantiating the VideoLoaderConfig class in Python. Some can also be specified directly in the command line.

Each model comes with a default video loading configuration. If no user-specified video loading configuration is passed - either through a YAML file or the Python VideoLoaderConfig class - all video loading arguments will be set based on the defaults for the given model.

    model_input_height: 240
    model_input_width: 426
    total_frames: 16
    # ... other parameters
from import VideoLoaderConfig
from zamba.models.config import PredictConfig
from zamba.models.model_manager import predict_model

predict_config = PredictConfig(data_dir="example_vids/")
video_loader_config = VideoLoaderConfig(
    # ... other parameters
    predict_config=predict_config, video_loader_config=video_loader_config

Let's look at the class documentation in Python.

>> from import VideoLoaderConfig
>> help(VideoLoaderConfig)

class VideoLoaderConfig(pydantic.main.BaseModel)
 |  VideoLoaderConfig(*,
 crop_bottom_pixels: int = None,
 i_frames: bool = False,
 scene_threshold: float = None,
 megadetector_lite_config: zamba.models.megadetector_lite_yolox.MegadetectorLiteYoloXConfig = None,
 frame_selection_height: int = None,
 frame_selection_width: int = None,
 total_frames: int = None,
 ensure_total_frames: bool = True,
 fps: float = None,
 early_bias: bool = False,
 frame_indices: List[int] = None,
 evenly_sample_total_frames: bool = False,
 pix_fmt: str = 'rgb24',
 model_input_height: int = None,
 model_input_width: int = None,
 cache_dir: pathlib.Path = None,
 cleanup_cache: bool = False) -> None


crop_bottom_pixels (int, optional)

Number of pixels to crop from the bottom of the video (prior to resizing to frame_selection_height). This can sometimes be useful if your videos have a persistent timestamp/camera brand logo at the bottom. Defaults to None

i_frames (bool, optional)

Only load the I-Frames. I-frames are highly dependent on the encoding of the video, so it is not recommended to use them unless you have verified that the i-frames of your videos are useful. Defaults to False

scene_threshold (float, optional)

Only load frames that correspond to scene changes, which are detected when scene_threshold percent of pixels are different. This can be useful for selecting frames efficiently if in general you have large animals and stable backgrounds. Defaults to None

megadetector_lite_config (MegadetectorLiteYoloXConfig, optional)

The megadetector_lite_config is used to specify any parameters that should be passed to the MegadetectorLite model for frame selection. For all possible options, see the MegadetectorLiteYoloXConfig class. If megadetector_lite_config is None (the default), the MegadetectorLite model will not be used to select frames.

frame_selection_height (int, optional), frame_selection_width (int, optional)

Resize the video to this height and width in pixels, prior to frame selection. If None, the full size video will be used for frame selection. Using full size videos (setting to None) is recommended for MegadetectorLite, especially if your species of interest are smaller. Defaults to None

total_frames (int, optional)

Number of frames that should ultimately be returned. Defaults to None

ensure_total_frames (bool)

Some frame selection methods may yield varying numbers of frames depending on timestamps of the video frames. If True, ensure the requested number of frames is returned by either clipping or duplicating the final frame. If no frames are selected, returns an array of the desired shape with all zeros. Otherwise, return the array unchanged. Defaults to True

fps (float, optional)

Resample the video evenly from the entire duration to a specific number of frames per second. Use values less than 1 for rates lower than a single frame per second (e.g., fps=0.5 will result in 1 frame every 2 seconds). Defaults to None

early_bias (bool, optional)

Resamples to 24 fps and selects 16 frames biased toward the beginning of the video. This strategy was used by the Pri-matrix Factorization machine learning competition winner. Defaults to False

frame_indices (list(int), optional)

Select specific frame numbers. Note: frame selection is done after any resampling. Defaults to None

evenly_sample_total_frames (bool, optional)

Reach the total number of frames specified by evenly sampling from the duration of the video. Defaults to False

pix_fmt (str, optional)

FFmpeg pixel format, defaults to rgb24 for RGB channels; can be changed to bgr24 for BGR.

model_input_height (int, optional), model_input_width (int, optional)

After frame selection, resize the video to this height and width in pixels. This controls the height and width of the video frames returned by load_video_frames. Defaults to None

cache_dir (Path, optional)

Cache directory where preprocessed videos will be saved upon first load. Alternatively, can be set with VIDEO_CACHE_DIR environment variable. Provided there is enough space on your machine, it is highly encouraged to cache videos for training as this will speed up all subsequent epochs after the first. If you are predicting on the same videos with the same video loader configuration, this will save time on future runs. Defaults to None, which means videos will not be cached.

cleanup_cache (bool, optional)

Whether to delete the cache directory after training or predicting ends. Defaults to False

Video prediction arguments

All possible model inference parameters for videos are defined by the PredictConfig class. Let's see the class documentation in Python:

>> from zamba.models.config import PredictConfig
>> help(PredictConfig)

class PredictConfig(ZambaBaseModel)
 |  PredictConfig(*,
 data_dir: DirectoryPath = Path.cwd(),
 filepaths: FilePath = None,
 checkpoint: FilePath = None,
 model_name: zamba.models.config.ModelEnum = <ModelEnum.time_distributed: 'time_distributed'>,
 gpus: int = 0,
 num_workers: int = 3,
 batch_size: int = 2,
 save: bool = True,
 save_dir: Optional[Path] = None,
 overwrite: bool = False,
 dry_run: bool = False,
 proba_threshold: float = None,
 output_class_names: bool = False,
 weight_download_region: zamba.models.utils.RegionEnum = 'us',
 skip_load_validation: bool = False,
 model_cache_dir: pathlib.Path = None) -> None


Either data_dir or filepaths must be specified to instantiate PredictConfig. If neither is specified, the current working directory will be used as the default data_dir.

data_dir (DirectoryPath, optional)

Path to the directory containing videos for inference. Defaults to the current working directory.

filepaths (FilePath, optional)

Path to a csv containing a filepath column with paths to the videos that should be classified.

checkpoint (Path or str, optional)

Path to a model checkpoint to load and use for inference. If you train your own custom models, this is how you can pass those models to zamba when you want to predict on new videos. The default is None, which will load the pretrained checkpoint if the model specified by model_name.

model_name (time_distributed|slowfast|european|blank_nonblank, optional)

Name of the model to use for inference. The model options that ship with zamba are blank_nonblank, time_distributed, slowfast, and european. See the Available Models page for details. Defaults to time_distributed

gpus (int, optional)

The number of GPUs to use during inference. By default, all of the available GPUs found on the machine will be used. An error will be raised if the number of GPUs specified is more than the number that are available on the machine.

num_workers (int, optional)

The number of CPUs to use during training. The maximum value for num_workers is the number of CPUs available on the machine. If you are using MegadetectorLite for frame selection, it is not recommended to use the total number of CPUs available. Defaults to 3

batch_size (int, optional)

The batch size to use for inference. Defaults to 2

save (bool)

Whether to save out predictions. If False, predictions are not saved. Defaults to True.

save_dir (Path, optional)

An optional directory in which to save the model predictions and configuration yaml. If no save_dir is specified and save is True, outputs will be written to the current working directory. Defaults to None

overwrite (bool)

If True, will overwrite zamba_predictions.csv and predict_configuration.yaml in save_dir if they exist. Defaults to False.

dry_run (bool, optional)

Specifying True is useful for ensuring a model implementation or configuration works properly by running only a single batch of inference. Defaults to False

proba_threshold (float between 0 and 1, optional)

For advanced uses, you may want the algorithm to be more or less sensitive to if a species is present. This parameter is a float, e.g., 0.6 corresponding to the probability threshold beyond which an animal is considered to be present in the video being analyzed.

By default no threshold is passed, proba_threshold=None. This will return a probability from 0-1 for each species that could occur in each video. If a threshold is passed, then the final prediction value returned for each class is probability >= proba_threshold, so that all class values become 0 (False, the species does not appear) or 1 (True, the species does appear).

output_class_names (bool, optional)

Setting this option to True yields the most concise output zamba is capable of. The highest species probability in a video is taken to be the only species in that video, and the output returned is simply the video name and the name of the species with the highest class probability, or blank if the most likely classification is no animal. Defaults to False

weight_download_region [us|eu|asia]

Because zamba needs to download pretrained weights for the neural network architecture, we make these weights available in different regions. us is the default, but if you are not in the US you should use either eu for the European Union or asia for Asia Pacific to make sure that these download as quickly as possible for you.

skip_load_validation (bool, optional)

By default, before kicking off inference zamba will iterate through all of the videos in the data and verify that each can be loaded. Setting skip_load_verification to True skips this step. Validation can be very time intensive depending on the number of videos. It is recommended to run validation once, but not on future inference runs if the videos have not changed. Defaults to False

model_cache_dir (Path, optional)

Cache directory where downloaded model weights will be saved. If None and the MODEL_CACHE_DIR environment variable is not set, will use your default cache directory (e.g. ~/.cache). Defaults to None

Video training arguments

All possible model training parameters for videos are defined by the TrainConfig class. Let's see the class documentation in Python:

>> from zamba.models.config import TrainConfig
>> help(TrainConfig)

class TrainConfig(ZambaBaseModel)
 |  TrainConfig(*,
 labels: Union[FilePath, pandas.DataFrame],
 data_dir: DirectoryPath = # your current working directory ,
 checkpoint: FilePath = None,
 scheduler_config: Union[str, zamba.models.config.SchedulerConfig, NoneType] = 'default',
 model_name: zamba.models.config.ModelEnum = <ModelEnum.time_distributed: 'time_distributed'>,
 dry_run: Union[bool, int] = False,
 batch_size: int = 2,
 auto_lr_find: bool = False,
 backbone_finetune_config: zamba.models.config.BackboneFinetuneConfig =
            backbone_initial_ratio_lr=0.01, multiplier=1,
            pre_train_bn=False, train_bn=False, verbose=True),
 gpus: int = 0,
 num_workers: int = 3,
 max_epochs: int = None,
 early_stopping_config: zamba.models.config.EarlyStoppingConfig =
            EarlyStoppingConfig(monitor='val_macro_f1', patience=5,
            verbose=True, mode='max'),
 weight_download_region: zamba.models.utils.RegionEnum = 'us',
 split_proportions: Dict[str, int] = {'train': 3, 'val': 1, 'holdout': 1},
 save_dir: pathlib.Path = # your current working directory ,
 overwrite: bool = False,
 skip_load_validation: bool = False,
 from_scratch: bool = False,
 use_default_model_labels: bool = True,
 model_cache_dir: pathlib.Path = None) -> None


labels (FilePath or pd.DataFrame, required)

Either the path to a CSV file with labels for training, or a dataframe of the training labels. There must be columns for filename and label. labels must be specified to instantiate TrainConfig.

data_dir (DirectoryPath, optional)

Path to the directory containing training videos. Defaults to the current working directory.

checkpoint (Path or str, optional)

Path to a model checkpoint to load and resume training from. The default is None, which automatically loads the pretrained checkpoint for the model specified by model_name. Since the default model_name is time_distributed the default checkpoint is zamba_time_distributed.ckpt

scheduler_config (zamba.models.config.SchedulerConfig, optional)

A PyTorch learning rate schedule to adjust the learning rate based on the number of epochs. Scheduler can either be default (the default), None, or a torch.optim.lr_scheduler.

model_name (time_distributed|slowfast|european|blank_nonblank, optional)

Name of the model to use for inference. The model options that ship with zamba are blank_nonblank, time_distributed, slowfast, and european. See the Available Models page for details. Defaults to time_distributed

dry_run (bool, optional)

Specifying True is useful for trying out model implementations more quickly by running only a single batch of train and validation. Defaults to False

batch_size (int, optional)

The batch size to use for training. Defaults to 2

auto_lr_find (bool, optional)

Whether to run a learning rate finder algorithm when calling pytorch_lightning.trainer.tune() to try to find an optimal initial learning rate. The learning rate finder is not guaranteed to find a good learning rate; depending on the dataset, it can select a learning rate that leads to poor model training. Use with caution. See the PyTorch Lightning docs for more details. Defaults to False

backbone_finetune_config (zamba.models.config.BackboneFinetuneConfig, optional)

Set parameters to finetune a backbone model to align with the current learning rate. Derived from Pytorch Lightning's built-in BackboneFinetuning. The default values are specified in the BackboneFinetuneConfig class: BackboneFinetuneConfig(unfreeze_backbone_at_epoch=5, backbone_initial_ratio_lr=0.01, multiplier=1, pre_train_bn=False, train_bn=False, verbose=True)

gpus (int, optional)

The number of GPUs to use during training. By default, all of the available GPUs found on the machine will be used. An error will be raised if the number of GPUs specified is more than the number that are available on the machine.

num_workers (int, optional)

The number of CPUs to use during training. The maximum value for num_workers is the number of CPUs available in the system. If you are using the Megadetector, it is not recommended to use the total number of CPUs available. Defaults to 3

max_epochs (int, optional)

The maximum number of epochs to run during training. Defaults to None

early_stopping_config (zamba.models.config.EarlyStoppingConfig, optional)

Parameters to pass to Pytorch lightning's EarlyStopping to monitor a metric during model training and stop training when the metric stops improving. The default values are specified in the EarlyStoppingConfig class: EarlyStoppingConfig(monitor='val_macro_f1', patience=5, verbose=True, mode='max')

weight_download_region [us|eu|asia]

Because zamba needs to download pretrained weights for the neural network architecture, we make these weights available in different regions. us is the default, but if you are not in the US you should use either eu for the European Union or asia for Asia Pacific to make sure that these download as quickly as possible for you.

split_proportions (dict(str, int), optional)

The proportion of data to use during training, validation, and as a holdout set. Defaults to {"train": 3, "val": 1, "holdout": 1}

save_dir (Path, optional)

Directory in which to save model checkpoint and configuration file. If not specified, will save to a version_n folder in your current working directory.

overwrite (bool, optional)

If True, will save outputs in save_dir and overwrite the directory if it exists. If False, will create an auto-incremented version_n folder within save_dir with model outputs. Defaults to False

skip_load_validation (bool, optional)

By default, before kicking off training zamba will iterate through all of the videos in the training data and verify that each can be loaded. Setting skip_load_verification to True skips this step. Validation can be very time intensive depending on the number of videos. It is recommended to run validation once, but not on future training runs if the videos have not changed. Defaults to False

from_scratch (bool, optional)

Whether to instantiate the model with base weights. This means starting from the imagenet weights for image based models and the Kinetics weights for video models. Only used if labels is not None. Defaults to False

use_default_model_labels (bool, optional)

Whether the species outputted by the model should be the default model classes (e.g. all 32 species classes for the time_distributed model). If you want the model classes to only be the species in your labels file (e.g. just gorillas and elephants), set to False. If either use_default_model_labels is False or the labels contain species that are not in the model, the model head will be replaced for finetuning. Defaults to True

model_cache_dir (Path, optional)

Cache directory where downloaded model weights will be saved. If None and the MODEL_CACHE_DIR environment variable is not set, will use your default cache directory, which is often an automatic temp directory at ~/.cache/zamba. Defaults to None

Image prediction arguments

All possible model inference parameters for images are defined by the ImageClassificationPredictConfig class.

>> from zamba.images.config import ImageClassificationPredictConfig
>> help(ImageClassificationPredictConfig)

Here's a description of all the parameters:

checkpoint (FilePath, optional)

Path to a custom checkpoint file (.ckpt) generated by zamba that can be used to generate predictions. If None, defaults to a pretrained model. Defaults to None.

model_name (str, optional)

Name of the model to use for inference. Options are: Defaults to

filepaths (FilePath, optional)

Path to a CSV containing images for inference, with one row per image in the data_dir. There must be a column called 'filepath' (absolute or relative to the data_dir). If None, uses all files in data_dir. Defaults to None.

data_dir (DirectoryPath, optional)

Path to a directory containing images for inference. Defaults to the working directory.

save (bool, optional)

Whether to save out predictions. If False, predictions are not saved. Defaults to True.

save_dir (Path, optional)

An optional directory in which to save the model predictions and configuration yaml. If no save_dir is specified and save=True, outputs will be written to the current working directory. Defaults to None.

overwrite (bool, optional)

If True, overwrite outputs in save_dir if they exist. Defaults to False.

crop_images (bool, optional)

Preprocess images using Megadetector or bounding box from labels file. Default is True.

detections_threshold (float, optional)

Threshold for Megadetector. Default value is 0.2.

gpus (int, optional)

Number of GPUs to use for inference. Defaults to all of the available GPUs found on the machine.

num_workers (int, optional)

Number of workers for parallel processing. Default is 3.

image_size (int, optional)

Image size for the input of the classification model. Default is 224.

results_file_format (ResultsFormat, optional)

The format in which to output the predictions. Currently 'csv' and 'megadetector' JSON formats are supported. Default is 'csv'.

results_file_name (Path, optional)

The filename for the output predictions in the save directory. Default is "zamba_predictions.csv".

model_cache_dir (Path, optional)

Cache directory where downloaded model weights will be saved. If None and no environment variable is set, will use your default cache directory. Defaults to None.

weight_download_region (str, optional)

s3 region to download pretrained weights from. Options are "us" (United States), "eu" (Europe), or "asia" (Asia Pacific). Defaults to "us".

Image training arguments

All possible model training parameters for images are defined by the ImageClassificationTrainingConfig class.

>> from zamba.images.config import ImageClassificationTrainingConfig
>> help(ImageClassificationTrainingConfig)

Here's a description of all the parameters:

data_dir (Path, required)

Where to find the files listed in filepaths (or where to look if filepaths is not provided).

labels (pd.DataFrame or FilePath, required)

Labels dataframe or path to CSV file containing labels.

labels_format (BboxFormat, optional)

Format for bounding box annotations. Defaults to BboxFormat.COCO.

checkpoint (FilePath, optional)

Path to a custom checkpoint file (.ckpt) generated by zamba that can be used to resume training. If None, defaults to a pretrained model. Defaults to None.

model_name (str, optional)

Base model name that will be loaded by timm lib (e.g. resnet50). Default is

name (str, optional)

Classification experiment name (MLFlow). Default value is 'image-classification'.

max_epochs (int, optional)

Max training epochs. Default value is 100.

lr (float, optional)

Learning rate value. Default value is 1e-5. If None, will find a good learning rate.

image_size (int, optional)

Image desired size. Default value is 224.

batch_size (int, optional)

Batch size. Default value is 16. This is the physical batch size; use accumulated_batch_size to set the virtual batch size.

accumulated_batch_size (int, optional)

Accumulated batch size; will accumulate gradients to this virtual batch size. Useful to match batch size / learning rate from published papers. If not specified, will use batch_size.

early_stopping_patience (int, optional)

Number of epochs with no improvement after which training will be stopped. Defaults to 3.

extra_train_augmentations (bool, optional)

If false, uses simple transforms for camera trap imagery (random perspective shift, random horizontal flip, random rotation); if true, also uses more complex transforms (random perspective shift, random horizontal flip, random rotation, random grayscale, random equalize, random autocontrast, random adjust sharpness).

num_workers (int, optional)

Number of workers to use for data loading. If None, default value is 8 (or 2/3 of available cores).

accelerator (str, optional)

Accelerator type. Default is "gpu" if CUDA is available, otherwise "cpu".

devices (Any, optional)

Devices to use for training. Default is "auto".

crop_images (bool, optional)

Preprocess images using Megadetector or bbox from labels file. Default is True.

detections_threshold (float, optional)

Threshold for Megadetector. Applied only if the bbox is not specified in the labels. Default value is 0.2.

checkpoint_path (Path, optional)

Directory for where to save the output files; defaults to current working directory.

weighted_loss (bool, optional)

Use weighted loss during training. Default value is False.

mlflow_tracking_uri (str, optional)

MLFlow tracking URI. Default is "./mlruns".

from_scratch (bool, optional)

Instantiate the model with base weights. Default is False.

use_default_model_labels (bool, optional)

By default, output the full set of default model labels rather than just the species in the labels file. Only applies if the provided labels are a subset of the default model labels. If set to False, will replace the model head for finetuning and output only the species in the provided labels file.

scheduler_config (SchedulerConfig or str, optional)

Config for setting up the learning rate scheduler on the model. If "default", uses scheduler that was used for training. If None, will not use a scheduler. Defaults to "default".

split_proportions (Dict, optional)

Split proportions (train, val, test). Default is {"train": 3, "val": 1, "test": 1}.

model_cache_dir (Path, optional)

Cache directory where downloaded model weights will be saved. Default is None.

cache_dir (Path, optional)

Path to the folder where clipped images will be saved. Applies only to training with images cropping (e.g. with bbox from coco format). Default is None.

weight_download_region (str, optional)

s3 region to download pretrained weights from. Options are "us" (United States), "eu" (Europe), or "asia" (Asia Pacific). Defaults to "us".

species_in_label_order (list, optional)

Optional list of species in the desired order. Default is None.