Snapshot Serengeti

Overview

This data set contains approximately 2.65M sequences of camera trap images, totaling 7.1M images, from seasons one through eleven of the Snapshot Serengeti project, the flagship project of the Snapshot Safari network. Using the same camera trapping protocols at every site, Snapshot Safari members are collecting standardized data from many protected areas in Africa, which allows for cross-site comparisons to assess the efficacy of conservation and restoration programs. Serengeti National Park in Tanzania is best known for the massive annual migrations of wildebeest and zebra that drive the cycling of its dynamic ecosystem.

Labels are provided for 61 categories, primarily at the species level (for example, the most common labels are wildebeest, zebra, and Thomson’s gazelle). Approximately 76% of images are labeled as empty. A full list of species and associated image counts is available here. We have also added approximately 150,000 bounding box annotations to approximately 78,000 of those images.

Citation, license, and contact information

The images and species-level labels are described in more detail in the associated manuscript:

Swanson AB, Kosmala M, Lintott CJ, Simpson RJ, Smith A, Packer C (2015) Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Scientific Data 2: 150026. (DOI) (bibtex)

Please cite this manuscript if you use this data set.

For questions about this data set, contact Sarah Huebner at the University of Minnesota.

This data set is released under the Community Data License Agreement (permissive variant).

The original Snapshot Serengeti data set included a “human” class label; for privacy reasons, we have removed those images from this version of the data set. Those labels are still present in the metadata. If those images are important to your work, contact us; in some cases it will be possible to release those images under an alternative license.

Data format

Annotations are provided in COCO Camera Traps .json format. .json files are provided for each season, and a single .json file is also provided for all seasons combined. Note that annotations are tied to images, but are only reliable at the sequence level. For example, there are rare sequences in which two of three images contain a lion, but the third is empty (lions, it turns out, walk away sometimes), but all three images would be annotated as “lion”.

Annotations are also provided in a (non-standard) .csv format. These are intended to allow replication of the original dataset paper, but they have not been maintained as diligently as the .json files and their format has not been documented, so unless you have a strong reason to use the .csv files, we recommend using the .json files.

Additional metadata related to the aggregation of human labels into consensus labels is available in an addendum.

We have also divided locations (i.e., cameras) into training and validation splits to allow for consistent benchmarking on this data set.

For information about mapping this dataset’s categories to a common taxonomy, see this page.

Downloading the data

Images are available in the following cloud storage folders:

  • gs://public-datasets-lila/snapshotserengeti-unzipped (GCP)
  • s3://us-west-2.opendata.source.coop/agentmorris/lila-wildlife/snapshotserengeti-unzipped (AWS)
  • https://lilawildlife.blob.core.windows.net/lila-wildlife/snapshotserengeti-unzipped (Azure)

A link to a zipfile per season is also provided below, but – whether you want the whole data set, a specific folder, or a subset of the data (e.g. images for one species) – we recommend checking out our guidelines for accessing images without using giant zipfiles.

Data download links:

Season 1 (242GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 2 (382GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 3 (25`GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 4 (368GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 5 (596GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 6 (361GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 7 (636GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 8 (part 1) (450GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 8 (part 2) (414GB) (images, GCP) (images, Azure) (images, AWS)
Season 9 (part 1) (432GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 9 (part 2) (432GB) (images, GCP) (images, Azure) (images, AWS)
Season 10 (part 1) (500GB) (images, GCP) (images, Azure) (images, AWS) (metadata)
Season 10 (part 2) (166GB) (images, GCP) (images, Azure) (images, AWS)
Season 11 (479GB) (images, GCP) (images, Azure) (images, AWS) (metadata)

All metadata (.csv)
All metadata (.json)
Bounding boxes
Recommended train/val splits

Having trouble downloading? Check out our FAQ.

Other useful links

MegaDetector results for all camera trap datasets on LILA are available here.

Information about mapping camera trap datasets to a common taxonomy is available here.

Posted by Dan Morris.