This data set contains images of whale sharks (Rhincodon typus) with bounding boxes and individual animal identifications. This dataset represents a collaborative effort based on the data collection and population modeling efforts conducted at Ningaloo Marine Park in Western Australia from 1995-2008. Photos (7888) and metadata from 2441 whale shark encounters were collected from 464 individual contributors, especially from the original research of Brad Norman and from members of the local whale shark tourism industry who sight these animals annually from April-June. Images were annotated with bounding boxes around each visible whale shark and viewpoints were labeled (e.g., left, right, etc.). A total of 543 individual whale sharks were identified by their unique spot patterning using first computer-assisted spot pattern recognition (Arzoumanian et al.) and then manual review and confirmation. A total of 7,693 named sightings were exported.
The dataset is released in the Microsoft COCO format and therefore uses flat image folders with associated YAML metadata files. We have collapsed the entire dataset into a single “train” label and have left “val” and “test” empty; we do this as an invitation to researchers to experiment with their own novel approaches for dealing with the unbalanced and chaotic distribution on the number of sightings per individual. All of the images in the dataset have been resized to have a maximum dimension of 3,000 pixels. The metadata for all animal sightings is defined by an axis-aligned bounding box and includes information on the rotation of the box (theta), the viewpoint of the animal, a species (category) ID, a source image ID, an individual string ID name, and other miscellaneous values. The temporal ordering of the images, and an anonymized ID for the original photographer, can be determined from the metadata for each image.
Citation, license, and contact information
For research or press contact, please direct all correspondence to Wild Me at firstname.lastname@example.org. Wild Me is a registered 501(c)(3) not-for-profit based in Portland, Oregon, USA and brings state-of-the-art computer vision tools to ecology researchers working around the globe on wildlife conservation.
This dataset is released under the Community Data License Agreement (permissive variant).
If you use this dataset in published work, please cite as:
Holmberg J, Norman B, Arzoumanian Z. Estimating population size, structure, and residency time for whale sharks Rhincodon typus through collaborative photo-identification. Endangered Species Research. 2009 Apr 8;7(1):39-53.
Downloading the data
Data download link:
Images and metadata (6GB)
Having trouble downloading? Check out our FAQ.