Caltech Camera Traps

This data set contains 243,100 images from 140 camera locations in the Southwestern United States, with labels for 21 animal categories (plus empty), primarily at the species level (for example, the most common labels are opossum, raccoon, and coyote), and approximately 66,000 bounding box annotations. Approximately 70% of images are labeled as empty.

More information about this data set is available here.

If you use this data set, please cite the associated manuscript:

Sara Beery, Grant Van Horn, Pietro Perona. Recognition in Terra Incognita. Proceedings of the 15th European Conference on Computer Vision (ECCV 2018). (bibtex)

Annotations are provided in COCO Camera Traps .json format.

We have also divided locations (i.e., cameras) into training and validation splits to allow for consistent benchmarking on this data set. The file describing this split specifies a train/val split for all locations in the data set, and also provides the train/val split used in the ECCV paper listed above. The “eccv_train” split here corresponds to the “train” locations and all “cis” locations in the ECCV paper; the “eccv_val” split here corresponds to all “trans” locations in the ECCV paper.

This data set is released under the Community Data License Agreement (permissive variant).

For questions about this data set, contact caltechcameratraps@gmail.com.

Download links:

Images (105GB)
Image-level annotations (9MB)

If you downloaded the image-level annotations prior to Sept 24, 2019, please use this updated version instead. The previous version had many images labeled with class “unlabeled_animal” which have now been fixed

Bounding box annotations (each box has class label “animal” or “vehicle”) (35MB)
Bounding box annotations (single-species images only; each box inherits the image-level class label) (35MB)
Recommended train/val splits

Having trouble downloading? Check out our FAQ.