HKH Glacier Mapping


The Hindu Kush Himalayas (HKH) glacier mapping dataset includes imagery of the Hindu Kush Himalayas (HKH) region, along with polygons indicating the locations of glaciers. This dataset is intended to facilitate the training of models that can identify glaciers in remotely-sensed imagery.

The HKH is also known as the world’s “Third Pole”, as it consists of one of the largest concentrations of snow and ice besides the two poles. It constitutes more than four million square kilometers of hills and mountains in the eight countries of Afghanistan, Bangladesh, Bhutan, China, India, Myanmar, Nepal, and Pakistan. Glaciers within this region have been identified and classified by experts at the International Centre for Integrated Mountain Development (ICIMOD).

This dataset couples those annotated glacier locations with multispectral imagery from Landsat 7 [1] and digital elevation and slope data from SRTM [2]. Imagery are provided as thirty-five Landsat tiles and 14,190 extracted numpy patches. Labels are available as raw vector data in shapefile format and as multichannel numpy masks. Both the labels and the masks are cropped according to the borders of the HKH region.

Python code for training and testing machine learning models using PyTorch, as well as the source for a glacier mapping web tool, can be found in the accompanying GitHub repository:

Dataset organization


At the highest level, this dataset is organized by tiles. A tile is a spatial area measuring roughly 6km x 7.5km (with definitions that roughly match up with USGS quarter quadrangles). Each tile comes with one corresponding GeoTIFF file. The entire glacier mapping dataset contains 35 tiles from Afghanistan, Bangladesh, Bhutan, China, India, Myanmar, Nepal, and Pakistan.

Each GeoTIFF tile consists of 15 channels:

  1. LE7 B1 (blue)
  2. LE7 B2 (green)
  3. LE7 B3 (red)
  4. LE7 B4 (near infrared)
  5. LE7 B5 (shortwave infrared 1)
  6. LE7 B6_VCID_1 (low-gain thermal infrared)
  7. LE7 B6_VCID_2 (high-gain thermal infrared)
  8. LE7 B7 (shortwave infrared 2)
  9. LE7 B8 (panchromatic)
  10. LE7 BQA (quality bitmask)
  11. NDVI (vegetation index)
  12. NDSI (snow index)
  13. NDWI (water index)
  14. SRTM 90 elevation
  15. SRTM 90 slope

These data were acquired from Google Earth Engine’s LE7 and SRTM collections using this script.

All channels are aligned at 30m spatial resolution. Elevation and slope channels were upsampled from 90m to 30m resolution.

Glacier annotations

Digital polygon data indicating the status of glaciers in the HKH region from 2002 to 2008 were provided by ICIMOD [3].


We also provide 14190 numpy patches. The numpy patches are all of size 512 x 512 x 15 and corresponding 512 x 512 x 2 pixel-wise mask labels; the two channels in the pixel-wise masks correspond to clean-iced and debris-covered glaciers. Patches’ geolocation information, time stamps, source Landsat IDs, and glacier density are available in a geojson metadata file. We show an example of the metadata below:

{ "type": "Feature", "properties": { "img_source": "\/datadrive\/glaciers\/unique_tiles\/LE07_149037_20041024.tif", "mask_source": "\/datadrive\/glaciers\/processed_exper\/masks\/mask_00.npy", "img_slice": "\/datadrive\/glaciers\/processed_exper\/slices\/slice_0_img_003.npy", "mask_slice": "\/datadrive\/glaciers\/processed_exper\/slices\/slice_0_mask_003.npy", "mask_mean_0": 0.0, "mask_mean_1": 0.0, "mask_mean_2": 0.0, "img_mean": 189.76698303222656 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 386932.71929824561812, 3564585.0 ], [ 386932.71929824561812, 3579767.12783851986751 ], [ 371750.78947368421359, 3579767.12783851986751 ], [ 371750.78947368421359, 3564585.0 ], [ 386932.71929824561812, 3564585.0 ] ] ] } }

Download links

Having trouble downloading? Check out our FAQ.

Citation and contact information

For questions about this dataset, contact or

If you use this dataset, please cite:

Baraka S, Akera B, Aryal B, Sherpa T, Shresta F, Ortiz A, Sankaran K, Lavista Ferres J, Matin M, Bengio Y. 2020. Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya. NeurIPS 2020 Climate Change AI Workshop (2020).



Annotations are released under the Community Data License Agreement (permissive variant).


Landsat and SRTM data have been released into the public domain. License information about SRTM and Landsat is available here and here, respectively.


  1. United States Geological Survey. Landsat 7. Online
  2. SRTM 90m DEM Digital Elevation Database. Online
  3. Bajracharya, S. R., & Shrestha, B. R. (2011). The status of glaciers in the Hindu Kush-Himalayan region. International Centre for Integrated Mountain Development (ICIMOD). Online

Posted by Dan Morris.