At the end of 2017, data scientists from more than 90 countries around the world drew on more than 300,000 video clips in a competition to build the best machine learning models for identifying wildlife from camera trap footage.
Following the competition, the top-performing submission was packaged into an open source software tool and made available for general use by researchers and conservationists.
Zamba means forest in Lingala. For centuries researchers have argued that a thorough understanding of the wildlife ecology in African zambas could reveal critical insights into our origins as humans.
But understanding requires observation, and observation is often no easy — or safe — task. Camera traps have become powerful tools in research and conservation efforts because they allow ecologists, biologists, and other researchers to study valuable footage of wildlife while reducing disruption of the natural dynamics in these habitats as well as long hours of latent human time between sightings.
Still, camera traps can't yet automatically label the species they observe. It takes the valuable time of experts, or thousands of citizen scientists, to label this data. Sometimes camera traps are triggered by animals of interest, others by falling branches of passing winds. Advances in artificial intelligence and computer vision hold enormous promise for taking on intensive video processing work and freeing up more time for humans to focus on interpreting the content and using the results.
As part of the Pan African Programme: The Cultured Chimpanzee, over 8,000 hours of camera trap footage has been collected across various chimpanzee habitats from camera traps in 15 African countries. Labeling the species in this footage is no small task. It takes a lot of time to determine whether or not there are any animals present in the data, and if so, which ones. This is where machine learning can help.
To date, thousands of citizen scientists have manually labeled video data through the Chimp&See Zooniverse project. In partnership with experts at The Max Planck Institute for Evolutionary Anthropology (MPI-EVA), this effort fed into a well-labeled dataset of nearly 2000 hours of camera trap footage from Chimp&See's database.
Using this dataset, DrivenData and MPI-EVA ran a machine learning challenge where hundreds of data scientists competed to build the best algorithms for automated species detection. The top 3 submissions that were best able to predict the presence and type of wildlife across new videos won the challenge and received €20,000 in monetary prizes.
We believe that the impact of a data competition shouldn’t end when submissions close. The code behind the top submissions from this and other DrivenData challenges have been released under an open source license for anyone to use and learn from. In addition, the machine learning model with the overall top performance has been packaged into an open source software tool that allows researchers and conservationists to interact with its predictions and feed in new video clips. The goal of this tool is to make it easier for researchers to use these state-of-the-art approaches in their work.
Ultimately, the winning techniques developed from this challenge provide a starting point for production-level automated species tagging for use in camera trap systems around the world. By decreasing the time that experts spend watching empty footage, we can improve their ability to focus on the outcomes that matter most.
Learn how to install and get started using Zamba by visiting the documentation page. It’s easy to use Zamba as a commandline tool or Python library!
Thanks to all the participants in the Pri-Matrix Factorization Challenge! Special thanks to Dmytro Poplovskiy (@dmytro), developer of the top-performing solution adapted for Project Zamba, the project team at the Max Planck Institute for Evolutionary Anthropology for organizing the competitions and the data, and to the ARCUS Foundation for generously funding this project.