This data set contains approximately 2.5M sequences of camera trap images, totaling 6.7M images, from seasons one through ten of the Snapshot Serengeti project, the flagship project of the Snapshot Safari network. Using the same camera trapping protocols at every site, Snapshot Safari members are collecting standardized data from many protected areas in Africa, which allows for cross-site comparisons to assess the efficacy of conservation and restoration programs. Serengeti National Park in Tanzania is best known for the massive annual migrations of wildebeest and zebra that drive the cycling of its dynamic ecosystem.
Labels are provided for 55 animal categories, primarily at the species level (for example, the most common labels are wildebeest, zebra, and Thomson’s gazelle). Approximately 75% of images are labeled as empty. We have also added approximately 150,000 bounding box annotations to approximately 78,000 of those images. A full list of species and associated image counts is available here.
The images and species-level labels are described in more detail in the associated manuscript:
Swanson AB, Kosmala M, Lintott CJ, Simpson RJ, Smith A, Packer C (2015) Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Scientific Data 2: 150026. (DOI) (bibtex)
Please cite this manuscript if you use this data set.
Annotations are provided in COCO Camera Traps .json format. .json files are provided for each season, and a single .json file is also provided for all seasons combined. The combined metadata is also provided in .csv format. Note that annotations are tied to images, but are only reliable at the sequence level. For example, there are rare sequences in which two of three images contain a lion, but the third is empty (lions, it turns out, walk away sometimes), but all three images would be annotated as “lion”.
We have also divided locations (i.e., cameras) into training and validation splits to allow for consistent benchmarking on this data set.
For questions about this data set, contact Sarah Huebner at the University of Minnesota.
This data set is released under the Community Data License Agreement (permissive variant).
The original Snapshot Serengeti data set included a “human” class label; for privacy reasons, we have removed those images from this version of the data set. Those labels are still present in the metadata. If those images are important to your work, contact us; in some cases it will be possible to release those images under an alternative license.
Additional metadata related to the aggregation of human labels into consensus labels is available in an addendum.
Data download links:
Season 1 (242GB) (metadata)
Season 2 (382GB) (metadata)
Season 3 (251GB) (metadata)
Season 4 (368GB) (metadata)
Season 5 (596GB) (metadata)
Season 6 (361GB) (metadata)
Season 7 (636GB) (metadata)
Season 8 (part 1) (450GB) (metadata)
Season 8 (part 2) (414GB)
Season 9 (part 1) (432GB) (metadata)
Season 9 (part 2) (432GB)
Season 10 (part 1) (500GB) (metadata)
Season 10 (part 2) (166GB)
All metadata (.csv)
All metadata (.json)
Recommended train/val splits
Having trouble downloading? Check out our FAQ.