List of other conservation data sets

There are lots of labeled data sets relevant to conservation that are not on LILA, of course, and rather than copying them all to LILA, this pagetracks other data sets we know about. This page complements the list of LILA datasets; the union of these two list is every conservation-related labeled dataset we’re aware of.

A few boundaries we draw around this list:

  • This page is intended to track data sets that are nearly “machine-learning-ready”, i.e. an interesting set of labels that’s more or less attached to an interesting set of images/documents/etc. We are not tracking, for example, large repositories of unlabeled data.
  • We are also not tracking private data repositories; roughly, if access requires anything more than filling out a form that’s almost auto-approved, it doesn’t go on this list.
  • This page is not intended to track labeled geospatial data, even though many labeled geospatial datasets are critical for conservation. We do try to track other lists of labeled geospatial data in the “other lists of data sets” section at the end of this page.

If you know of data sets not on this list, or if you own one of these data sets but can no longer maintain it and would like to transfer it over to LILA, email us!

Table of contents

Image data sets (terrestrial wild animals) (ground-based sensors)
Image data sets (terrestrial wild animals) (aerial/drone)
Image data sets (domestic animals)
Image data sets (marine/freshwater)
     …where marine life looks like
     …where marine life is more nuanced
Image data sets (plants)
Image data sets (geospatial)
Image data sets (other)
Acoustic data sets
Competitions
Other lists of data sets

Terrestrial wild animal images (ground-based sensors)

iNaturalist competition data (animal photos)

The iNaturalist competition provides around 500k labeled handheld-camera photos of around 8k species, varying a bit from year to year. Data originate from iNaturalist, a citizen-science platform for wildlife observation.

NABirds (bird photos)

Around 48k images of 400 species of birds, with gender and age labels in many cases.

Caltech-UCSD birds (bird photos)

Caltech-UCSD Birds 200 (CUB-200) is an image dataset with photos of 200 bird species (mostly North American), including species labels, bounding boxes, and coarse segmentation masks.

Animals with attributes (animal photos)

37322 images of 50 animals classes with pre-extracted feature representations for each image.

Carrizo Camera Traps (camera traps)

100k camera trap images from California

Denison University Camera Trap Data (unlabeled camera traps)

~200 camera trap images from Denison University Biological Reserve (unlabeled)

MammalWeb OSF Data (camera traps)

~35k camera trap images from MammalWeb, with species labels

Penguin Counting in the Wild (camera trap photos with keypoints)

73,802 images taken by 15 different cameras from the Penguin Watch project, including keypoints indicating penguin locations within each image. Also available here in .mat and .json formats.

apic.ai bee poses (annotated bee images)

~200 images of bees with keypoint annotations

Arribada Human-Wildlife Conflict (annotated elephant images)

~76k thermal images of elephants, humans, and goats

Chimpanzee Faces in the Wild (individual ID)

Around 80 labeled examples each from around 25 individual chimps, with individual identifications.

Wildlife Image and Localization Dataset (species and bounding box labels)

Around 6k handheld-camera images and around 12k bounding boxes for 28 species.

Plittersdorf dataset (stereo camera trap images)

221 stereo camera trap videos of deer, with instance masks

PolarBearVidID (polar bear videos w/individual ID)

1431 video sequences of 13 individual captive polar bears

ANTS: ant detection and tracking (ants w/boxes)

~5k frames of ants with boxes

DSAIL-Porini: Annotated camera trap images … in Kenya (camera trap images)

8524 images of grazing animals in Kenya from custom camera traps

Karioi Predator Camera Trap (camera trap crops)

~6k 224×224 crops from camera trap images in New Zealand

Object detection of insects (boxes on insects)

~30k boxes on nine taxa of insects

BIOSCAN-1M (lab images of insects)

>1M lab images of insects with species labels

MEWC Case Study (camera trap crops)

50k cropped camera trap images from Tasmania

Marburg Camera Traps (camera trap images w/boxes)

~2100 images of European mammals and birds w/boxes

PanAf20K (ape videos w/behavior)

20k videos of apes with behavioral labels, 500 videos with boxes

Wild Animals Facing Extinction (boxes on handheld camera images)

7634 handheld-camera images of African mammals with boxes

Florida Wildlife Camera Trap Dataset

105k camera trap images from Florida with species-level labels

Terrestrial wild animal images (aerial/drone)

Also see this much more detailed list of datasets with annotated wildlife in drone/aerial images.

Improving the precision and accuracy of animal population estimates with aerial image object detection

Point and species annotations on aerial images of savanna

UAV-derived waterfowl thermal imagery dataset (thermal and RGB images)

Waterfowl annotated in thermal drone images

Drones count wildlife more accurately and precisely than humans (counts and drone images)

Counts of fake bird colonies in drone images

Counting animals in aerial images with a density map estimation model (penguins in aerial images)

Keypoint annotations on penguins in aerial images

The Aerial Elephant Dataset (aerial images)

>2k images containing >15k annotated elephants

Global Model of Bird Detection Dataset (birds in aerial images)

Images and around 250,000 keypoint annotations from 13 bird detection projects.

Aerial Photo Imagery from Fall Waterfowl Surveys (birds in aerial images)

~130k aerial images with keypoint annotations on birds.

A different packaging of the same dataset is hosted on LILA.

Drones and deep learning for seabird colonies (birds in drone images)

28 drone mosaics with ~40k annotations on penguins and albatrosses

Naemura Lab Cattle Detection

~2000 boxes on cattle in aerial imagery

Large-scale, Semi-Automatic Inference of Animal Behavior from Monocular Videos

Oblique aerial videos of zebras with 162931 bounding boxes and behavioral labels (standing, grazing, etc.)

Quantifying the movement, behaviour and environmental context of group-living animals using drones and computer vision

Drone images of ungulates and geladas with 40532 bounding boxes

Deep object detection for waterbird monitoring using aerial imagery

Drone images of seabirds with 23078 boxes

KABR: In-Situ Dataset for Kenyan Animal Behavior Recognition from Drone Videos

130366 drone videos, each following a single individual savanna ungulate, with behavior labels

WAID: Wildlife Aerial Images from Drone

14,366 drone images with boxes on ungulates and seals

Identification of free-ranging mugger crocodiles (crocodiles in drone images)

Individual ID annotations on crocodiles in drone images

Camera trap images of toads, lizards, and snakes

~6400 images from downward-facing cameras, containing toads/lizards/snakes

Domestic animals

Stanford Dogs (dog photos with bounding boxes)

Around 20k images of 120 dog breeds, with both class labels and bounding boxes. Conservation-related? Not exactly, but let’s face it, lots of us work on this kind of data because we like looking at pictures of animals.

Oxford Pets (pet photos with bounding boxes and masks)

Around 7500 images of pets in 37 classes, with class labels, bounding boxes, and segmentation masks. Again, maybe not squarely related to conservation or biology, but finding furry things with machine learning is finding furry things with machine learning, right?

Fresian Cattle 2015 (individual ID)

~350 images labels as ~50 individual cows

Cattle Noseprints for Individual ID

~5000 images of cattle muzzles with individual IDs

Cows2021

~10k images and ~300 videos of in-barn cattle with boxes and individual IDs

OpenCows2020

~3700 images of in-barn cattle with boxes and individual IDs

CherryChèvre (goats with bounding boxes)

6160 images of domestic goats from handheld or security-style cameras, with boxes

Amsterdamse Waterleidingduinen pilots (camera traps)

~50k camera trap images from the Netherlands

Marine/freshwater images

This section is broken into datasets where marine life looks like what a little kid thinks a fish looks like (you know, like ), and datasets with a more diverse concept of marine life.

Marine/freshwater images (where fish look fishy)

Project Natick Underwater Video (marine species)

~1k images of fish w/bounding boxes

Application of a Deep Learning Image Classifier for Identification of Amazonian Fishes (segmented fish)

~3k images of out-of-water fish w/species labels and segmentation masks

Roboflow Fish Dataset (boxes on fish)

680 images of fish w/bounding boxes

Labeled Fishes in the Wild (boxes on fish)

~1k images of fish w/boxes, ~3k blanks

Fishnet.AI (images of fishing vessels)

~163k bounding boxes on ~35k images of fish and people on fishing vessels

Croatian Fish (cropped images of fish)

800 images of fish in 12 classes (description)

DeepFish (annotated fish images)

~40k images with a mix of classification, segmentation, and counting labels

The Brackish Dataset (annotated videos of fish)

~90 videos with bounding boxes on fish

Deep Vision Fish Dataset (segmented fish)

Segmented fish and associated empty backgrounds, intended for training data generation

BrackishMOT (annotated videos of fish)

98 videos of fish with tracking boxes (i.e., boxes with stable frame-to-frame IDs)

Visual Marine Animal Tracking (VMAT)

32 video sequences with bounding boxes on a variety of species

OzFish (BRUV images w/boxes)

80k cropped fish images with 45k bounding boxes

VIAME FishTrack (BRUV images w/boxes)

Several thousand BRUV images with bounding boxes on fish and bait

F4K Detection and Tracking (videos with tracking points)

17 10-minute videos with tracking points

FishCLEF-2015 (videos with boxes)

14k boxes on fish in 20k images

Brackish Underwater Dataset (images with boxes)

12.5k boxes on fish and other species in 15k images

WildFish (cropped images of fish from online sources)

54,459 images of fish in 1000 categories

Object detection of tropical freshwater fish in Australia (freshwater species)

~44k images of fish w/ ~83kbounding boxes

AFFiNe (images of fish)

~7k labeled images of freshwater fish, generally not in the water, cropped close

NOAA Puget Sound Nearshore Fish (fish w/boxes)

~68k boxes on fish and crabs. (I don’t generally include LILA datasets on this page, but I’m breaking my own rule just this once, because I use this section as a de facto list of public fish-y-fish datasets.)

Brook trout imagery for individual ID (fish w/ID)

435 images of brook trout with individual ID labels

AAU Zebrafish Re-Identification Dataset

~2200 images of zebrafish with individual IDs

3D-ZeF20

Eight long stereo video sequences of zebrafish with boxes and keypoints

Salmon Computer Vision

Boxes on 532,000 frames from 1,567 videos of salmon in two weirs

Marine/freshwater images (where marine life doesn’t exactly look fishy)

Sea turtles in drone imagery

Point annotations on sea turtles in drone images

FathomNet (annotated images of ocean life/structures)

~70k labeled images representing a variety of marine entities

NOAA Dolphin ID

1011 dolphin fin images with individual IDs

Whales from Space (exactly what it says)

633 boxes on whales in satellite imagery

SMarTar-ID (Standardised Marine Taxon Reference Image Database) (ocean imagery)

Database of ocean species images, particularly cnidarians and sponges

Eagle rays images (boxes on rays)

~500 aerial images w/boxes on rays

Caltech Fish Counting (freshwater sonar)

>500k annotations on fish in sonar video

SealID (individal seal ID images)

Images of ringed seals with individual ID labels and segmentation masks

Plants

Kahikatea dataset (segmented trees)

Aerial imagery from New Zealand in which Kahikatea trees have been masked

Tree species in Northern Australia (trees in UAV imagery)

2547 polygons on 36 Australian tree species

Urban Tree Detection Data (trees in aerial imagery)

Keypoints on ~40k trees in NAIP data.

The Auto Arborist Dataset (trees in street-level imagery)

>2M trees in street-level imagery annotated by genus

Oxford Flowers (flower photos)

Images of approximately 120 flower species, with between 40 and 250 images of each.

Pl@ntNet-300K (images of plants)

~300k labeled images of plants

Healthy vs. Diseased Leaf Image Dataset (images of leaves)

~4k images labeled w/species and disease status

CanaTree100 (images of trees)

100 images with 920 trees w/segmentation masks

NeonTreeEvaluation (trees in aerial and lidar surveys)

~3k bounding box annotations on RGB, hyperspectral, and lidar survey data

Pasadena Urban Trees (trees in aerial and street view photos)

Around 30k trees, imaged from aerial and street views, with location and species information.

FOR-instance (segmented trees in lidar)

1130 trees (with classes) manually segmented in airborne lidar data

Avo-AirDB (segmented trees in drone imagery)

986 drone images of avocado plantations with segmented individual trees

Image data sets (geospatial)

BigEarthNet (land cover, satellite)

>500k Sentinel images with patch-level land cover labels

EuroSAT (land cover, satellite)

>27k Sentinel images with patch-level land cover labels

Image data sets (other)

TACO (trash)

Segmentation labels and taxonomic identifiers for garbage.

DL for meteorological by-catch (weather in camera traps)

Camera trap images labeled according to weather conditions.

Bioacoustic data sets

Fully-Annotated Soundscape Recordings from the Northeastern United States

285 hour-long recordings with 50,760 bounding boxes on 81 bird species

Fully-Annotated Soundscape Recordings from the Western United States

33 hour-long recordings with 20,147 bounding boxes on 56 bird species

Fully-Annotated Soundscape Recordings from the Southwestern Amazon Basin

21 hour-long recordings with 14,798 bounding boxes on 132 bird species

Fully-Annotated Soundscape Recordings from the Sierra Nevada Mountains

100 10-minute recordings with 10,296 bounding boxes on 21 bird species

Fully-Annotated Soundscape Recordings from Hawaii

635 recordings with 59,583 bounding boxes on 27 bird species

Fully-Annotated Soundscape Recordings from coffee farms in Colombia and Costa Rica

34 hour-long recordings with 6,952 bounding boxes on 89 bird species

An annotated set of audio recordings of Eastern NA birds

16,052 annotations on 48 species in 385 minutes

BirdVox

Several hundred thousand labeled audio clips of North American birds

xeno-canto

>1M recordings of >10k species (mostly birds)

Xeno-canto data on GBIF: birds, bats, insects, soundscapes

AnuraSet

93k recordings of frogs

Avian dawn chorus in CA, OR, and WA

~40k annotations on ~12 hours of audio for 118 sound types including 58 bird species

Watkins Marine Mammal Sounds (marine mammal recordings)

~15,000 high-quality excerpts from 32 marine mammal species, and additional lower-quality or unannotated data

Orcasound data (orca recordings)

Annotated orca recordings

British Library (bird sounds)

~50k recordings with species labels

EDANSA-2019 (soundscapes)

27 hours of arctic soundscapes annotated with 28 classes

A bunch of relevant competitions

Competitions are a great way to get started doing machine learning for environmental science, and each comes with a data set. Here are a few competitions that involve wildlife…

Competitions: terrestrial animal images

Conser-vision Practice Area (camera trap image classification)

Deep Chimpact (depth estimation for wildlife conservation)

Hakuna Ma-data (camera trap image classification)

Pri-matrix Factorization (individual chimp recognition)

iNaturalist computer vision competition (handheld photos of animals)

iWildCam (camera trap images)

Amur Tiger Re-identification in the Wild (individual ID for tigers from video)

NOAA Fisheries Steller Sea Lion Population Count (aerial images)

Snake Species ID Challenge (w/ ~100k images of ~4k snake species)

SnakeCLEF (snake images)

Competitions: marine/freshwater images

N+1 fish, N+2 fish (fish detection and classification in ship-deck photos)

Where’s Whale-do? (individual whale identification)

Great Barrier Reef Crown of Thorns Detection (marine video with object labels on starfish)

NOAA Right Whale Recognition (aerial images)

Humpback Whale Identification (individual ID from fluke photos)

Nature Conservancy Fisheries Monitoring (species ID from ship-deck photos)

ImageCLEF coral (coral localization and identification in images)

SeaCLEF (marine animal identification in images and video)

Sea Turtle Face Detection (bounding box annotations on turtles)

Turtle Recall (individual ID)

FathomNet 2023 (species ID)

Competitions: plant images

PlantCLEF (plant identification in images)

Competitions: geospatial images

Amazon Rainforest Challenge (deforestation monitoring from Landsat/Sentinel images)

Understanding the Amazon from Space (land cover from satellite images)

ICLR workshop challenge on crop detection from satellite imagery

Competitions: bioacoustics

Whale Detection Challenge (bioacoustics)

DCASE Bird Audio Detection challenge (bird detection from audio) (2018, 2021, 2022, 2023, 2024)

BirdCLEF (bird identification in audio) (2022, 2023, 2024)

Cornell Birdcall Identification (bird detection and identification from audio)

NIPS4B Multilabel Bird Species Classification (from audio)

Competitions: other

Random Walk of the Penguins (penguin population change prediction)

GeoLifeCLEF (species distribution estimation)

FungiCLEF (fungi images)

Other useful lists of conservation data sets

LILA datasets (just in case someone landed here and isn’t aware of the context… the page you’re looking at right now is a list of datasets that aren’t hosted on LILA)

Datasets with annotated wildlife in drone/aerial images

Wildlife Re-ID Datasets

Other useful lists of open data sets

…that are relevant to environmental science, though maybe not directly focused on conservation.

Source Cooperative (formerly Radiant ML Hub)

Esri Living Atlas of the World

Microsoft Planetary Computer Data Catalog

Earth on AWS

Google Earth Engine Data Catalog

OpenForest (a list of open-access forestry data)

Other queries one might run to find conservation data sets

GBIF Datasets

Kaggle dataset/competition search for “wildlife”

Kaggle dataset/competition search for “conservation”

Kaggle dataset/competition search for “animals”

Google Dataset Search for “wildlife”

Google Dataset Search for “conservation”

Google Dataset Search for “animals”

Posted by Dan Morris.