This data set contains approximately 2.65M sequences of camera trap images, totaling 7.1M images, from seasons one through eleven of the Snapshot Serengeti project, the flagship project of the Snapshot Safari network. Using the same camera trapping protocols at every site, Snapshot Safari members are collecting standardized data from many protected areas in Africa, which allows for cross-site comparisons to assess the efficacy of conservation and restoration programs. Serengeti National Park in Tanzania is best known for the massive annual migrations of wildebeest and zebra that drive the cycling of its dynamic ecosystem.
Labels are provided for 61 categories, primarily at the species level (for example, the most common labels are wildebeest, zebra, and Thomson’s gazelle). Approximately 76% of images are labeled as empty. A full list of species and associated image counts is available here. We have also added approximately 150,000 bounding box annotations to approximately 78,000 of those images.
Citation, license, and contact information
The images and species-level labels are described in more detail in the associated manuscript:
Swanson AB, Kosmala M, Lintott CJ, Simpson RJ, Smith A, Packer C (2015) Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Scientific Data 2: 150026. (DOI) (bibtex)
Please cite this manuscript if you use this data set.
For questions about this data set, contact Sarah Huebner at the University of Minnesota.
This data set is released under the Community Data License Agreement (permissive variant).
The original Snapshot Serengeti data set included a “human” class label; for privacy reasons, we have removed those images from this version of the data set. Those labels are still present in the metadata. If those images are important to your work, contact us; in some cases it will be possible to release those images under an alternative license.
Annotations are provided in COCO Camera Traps .json format. .json files are provided for each season, and a single .json file is also provided for all seasons combined. The combined metadata is also provided in .csv format. Note that annotations are tied to images, but are only reliable at the sequence level. For example, there are rare sequences in which two of three images contain a lion, but the third is empty (lions, it turns out, walk away sometimes), but all three images would be annotated as “lion”.
Additional metadata related to the aggregation of human labels into consensus labels is available in an addendum.
We have also divided locations (i.e., cameras) into training and validation splits to allow for consistent benchmarking on this data set.
Downloading the data
A link to a zipfile is provided below, but – whether you want the whole data set, a specific folder, or a subset of the data (e.g. images for one species) – we recommend checking out our guidelines for accessing images without using giant zipfiles.
Data download links:
Season 1 (242GB) (metadata)
Season 2 (382GB) (metadata)
Season 3 (251GB) (metadata)
Season 4 (368GB) (metadata)
Season 5 (596GB) (metadata)
Season 6 (361GB) (metadata)
Season 7 (636GB) (metadata)
Season 8 (part 1) (450GB) (metadata)
Season 8 (part 2) (414GB)
Season 9 (part 1) (432GB) (metadata)
Season 9 (part 2) (432GB)
Season 10 (part 1) (500GB) (metadata)
Season 10 (part 2) (166GB)
Season 11 (479GB) (metadata)
All metadata (.csv)
All metadata (.json)
Recommended train/val splits
Having trouble downloading? Check out our FAQ.
Posted by Dan Morris.