Conservation Drones

Monitoring of protected areas to curb illegal activities like poaching is a monumental task. Real-time data acquisition has become easier with advances in unmanned aerial vehicles (UAVs) and sensors like TIR cameras, which allow surveillance at night when poaching typically occurs. However, it is still a challenge to accurately and quickly process large amounts of the resulting TIR data. The Benchmarking IR Dataset for Surveillance with Aerial Intelligence (BIRDSAI, pronounced “bird’s-eye”) is a long-wave thermal infrared (TIR) dataset containing nighttime images of animals and humans in Southern Africa. The dataset allows for testing of automatic detection and tracking of humans and animals with both real and synthetic videos, in order to protect animals in the real world.

There are 48 real aerial TIR videos and 124 synthetic aerial TIR videos (generated with AirSim), for a total of 62k and 100k images, respectively. Tracking information is provided for each of the animals and humans in these videos. We break these into labels of animals or humans, and also provide species information when possible, including for elephants, lions, and giraffes. We also provide information about noise and occlusion for each bounding box.

Data layout

In the training set that is provided, there are two folders, one for simulated data (TrainSimulation), one for real data (TrainReal). Each of these folders contains folders for the annotation .csv files for each video (annotations) and the individual .jpg frames in each video (images).

In the “images” folder in “TrainSimulation”, there are folders for each video; in addition to the .jpg infrared images, these zip files also contain infrared .png, RGB, and segmentation images provided by AirSim. We include in “TrainSimulation/annotations” two files containing the infrared digital counts for the different objects in the scene for both winter and summer. These, combined with the infrared simulation .png files, allow you to search for different objects in the images, if you’re looking for further information than is provided in the annotation .csv files.


We follow the MOT annotation format, which is a .csv file with the following columns:

[frame_number], [object_id], [x], [y], [w], [h], [class], [species], [occlusion], [noise]

  • class: 0 if animals, 1 if humans
  • species: -1: unknown, 0: human, 1: elephant, 2: lion, 3: giraffe, 4: dog, 5: crocodile, 6: hippo, 7: zebra, 8: rhino. 3 and 4 occur only in real data. 5, 6, 7, 8 occur only in synthetic data.
  • occlusion: 0 if there is no occlusion, 1 if there is an occlusion (i.e., either occluding or occluded) (note: intersection over union threshold of 0.3 used to assign occlusion; more details in paper)
  • noise: 0 if there is no noise, 1 if there is noise (note: noise labels were interpolated from object locations in previous and next frames; for more than 4 consecutive frames without labels, no noise labels were included; more details in paper)

Within the “TrainReal/annotations” folders, we include a file called “water_metadata….txt”, which contains has the names of videos that contain water.

Although “annotations” can be used for single- and multi-object tracking, we have included within “TrainReal/annotations” the exact training sequences and splits used in our single-object tracking experiments under “tracking”. Helper code and a README is provided at to give more detail on how to use these sequences.

Downloading the data

Unlabeled test data

This dataset was used as the basis for the ICVGIP Visual Data Challenge at ICVGIP 2020; was used as training data, and the following un-annotated data sets (in the same format) were used as test data: (1.6GB)

Having trouble downloading?

Check out our FAQ.

License, citation, and acknowledgements

This dataset is released under the Community Data License Agreement (permissive variant).

If you use this dataset, please consider citing our paper:

Bondi E, Jain R, Aggrawal P, Anand S, Hannaford R, Kapoor A, Piavis J, Shah S, Joppa L, Dilkina B, Tambe M. BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos. (bibtex)

For questions about this dataset, contact Elizabeth Bondi at Harvard University (

This work was supported by Microsoft AI for Earth, NSF grants CCF-1522054 and IIS-1850477, MURI W911NF-17-1-0370, and the Infosys Center for Artificial Intelligence, IIIT-Delhi. We also thank the labeling team.

Posted by Dan Morris.