Beluga ID 2022


This data set contains Beluga whales (Delphinapterus leucas) with pre-cropped images and individual animal identifications. This dataset represents a collaborative effort based on the data collection and population modeling efforts conducted in the Cook Inlet off the cost of Alaska from 2016-2019. The photos (5,902) and metadata from 1,617 unique encounters (within 1 hour) were collected from boat-based cameras and a camera looking down from above on an aerial drone. Images are annotated with full-image bounding boxes and viewpoints were labeled (top, left, right). A total of 788 individual Beluga whales were identified by hand by trained experts using scarring patterns and other visual markings. This dataset is being released in tandem with the “Where’s Whale-do?” ID competition hosted by DrivenData and is identical to the public training set used in that competition.

Data format

The training dataset is released in the Microsoft COCO .json format. We have collapsed the entire dataset into a single “train” label and have left “val” and “test” empty; we do this as an invitation to researchers to experiment with their own novel approaches for dealing with the unbalanced and chaotic distribution on the number of sightings per individual. All of the images in the dataset have been resized to have a maximum dimension of 1,200 pixels. The metadata for all animal sightings is defined by an axis-aligned bounding box and includes information on the viewpoint of the animal, a species (category) ID, a source image ID, an individual string ID name, and other miscellaneous values. The temporal ordering of the images can be determined from the metadata for each image.

Test data was added later, after the competition, and is thus in a different format. Contact the dataset owner for questions about the test data.

Citation, license, and contact information

For research or press contact, please direct all correspondence to Wild Me at Wild Me is a registered 501(c)(3) not-for-profit based in Portland, Oregon, USA and brings state-of-the-art computer vision tools to ecology researchers working around the globe on wildlife conservation.

This dataset is released under the Community Data License Agreement (permissive variant).

Downloading the data

Data download links for the train split:

Data download links for the test split:

Having trouble downloading? Check out our FAQ.

Posted by Dan Morris.