Frequently Asked Questions

Table of contents

  1. Having trouble downloading?
  2. How can I download just part of a data set, and/or download an already-unzipped dataset?
  3. Where is all the data stored?
  4. What data formats do you use on LILA?
  5. Do you provide MD5 values for LILA files?
  6. Can you help me find enough processing power to work with all this data?

I’m having trouble downloading large files from LILA; my browser eventually gives up. Help!

If you’re having download issues, we recommend trying AzCopy, a command-line tool for downloading large files that are stored on Azure. Install AzCopy from here (available for Windows, Linux, and MacOS).

If the URL you’re trying to download is:

…use the following AzCopy command line to copy it to your current directory:

azcopy cp "" "/absolute/path/to/desired/local/dir/"

The URL and the destination path should be surrounded in double-quotes.

If you can’t use AzCopy for one reason or another, you have another option that will be faster and more reliable than hitting the link directly, but substantially slower than AzCopy. So try AzCopy first!

But if that doesn’t work for you, you can do a direct download via the Microsoft Content Distribution Network; particularly when others in your area have already downloaded the same file recently, this will faster than standard download. If you are trying to download the file:

…you can get the same file from the CDN with the URL:

…but this URL won’t work with AzCopy, and we promise that AzCopy is faster!

How can I download just part of a data set, and/or download an already-unzipped dataset?

For tips on downloading part of a data set (e.g. a folder or a species) and/or eliminating the need to unzip a giant zipfile after downloading, check out our guide on directly accessing images without using giant zipfiles.

I want to set up an Azure VM near LILA data. Where is all the data stored?

All the data on LILA is stored in the South Central US Azure region. If you are going to set up an Azure compute environment to work with this data, we recommend placing it in the South Central US region.

What format do you use for metadata on LILA?

We use different formats depending on the nature each data set, but for camera trap data (and we love camera trap data!), we have tried to convert all data to a common format. More on the details of this format in a second, but if you share data on LILA, we’re willing to do the work to get your data into this format, and post that along with your original metadata. This greatly facilitates letting new researchers work with your data.

We use the “COCO Camera Traps” .json format proposed by Beery et al., which is a refinement of the format used by the COCO data set, adding fields specific to camera trap data sets. The format is formally specified here.

I want to confirm that my download wasn’t corrupted… do you publish file sizes and MD5s?

We sure do! See this page.

I want to try my hand at machine learning for conservation, but don’t have the processing power to deal with these big datasets. Do you happen to know a good way to get free compute resources?

LILA is supported by the Microsoft AI for Earth program, which gives out compute grants to folks working at the intersection of machine learning and environmental science. Ask us for compute credits!