Table of contents
- Having trouble downloading?
- How can I download just part of a data set, and/or download an already-unzipped dataset?
- Where is all the data stored?
- What data formats do you use on LILA?
- Have the camera trap datasets been mapped to a common taxonomy?
- Do you provide MD5 values for LILA files?
I’m having trouble downloading large files from LILA; my browser eventually gives up. Help!
If you’re having issues when downloading files in your browser, we recommend trying the command-line tools gsutil (for Google Cloud Platform URLs containing “storage.googleapis.com”) or AzCopy (for Azure Blob Storage URLs containing “blob.core.windows.net”). Most datasets are available on both GCP and Azure, downloading using either command-line tool will be faster and more reliable than downloading through your browser. Here are quick instructions for downloading files with gsutil and azcopy…
Downloading a file with gsutil
If the URL you’re trying to download is:
https://storage.googleapis.com/public-datasets-lila/mydataset/myfile.zip
…use the following gsutil command line to copy it to your current directory:
gsutil cp "gs://public-datasets-lila/mydataset/myfile.zip" "./myfile.zip"
Note that we replaced “https://storage.googleapis.com/” with “gs://”.
The URL and the destination path should be surrounded in double-quotes.
Downloading a file with AzCopy
If the URL you’re trying to download is:
https://lilablobssc.blob.core.windows.net/mydataset/myfile.zip
…use the following AzCopy command line to copy it to your current directory:
azcopy cp "https://lilablobssc.blob.core.windows.net/mydataset/myfile.zip" "./myfile.zip"
The URL and the destination path should be surrounded in double-quotes.
Downloading a file via the Azure CDN
If you can’t use gsutil/AzCopy for one reason or another, and the file is available on Azure, you have another option that will be faster and more reliable than hitting the link directly, but substantially slower than gsutil/AzCopy. So try gsutil/AzCopy first!
But if that doesn’t work for you, you can do a direct download via the Microsoft Content Distribution Network; particularly when others in your area have already downloaded the same file recently, this will faster than standard download. If you are trying to download the file:
https://lilablobssc.blob.core.windows.net/mydataset/myfile.zip
…you can get the same file from the CDN with the URL:
https://lilacdn.azureedge.net/mydataset/myfile.zip
How can I download just part of a data set, and/or download an already-unzipped dataset?
For tips on downloading part of a data set (e.g. a folder or a species) and/or eliminating the need to unzip a giant zipfile after downloading, check out our guide on directly accessing images without using giant zipfiles.
I want to set up a GCP or Azure VM near LILA data. Where is all the data stored?
All the data on LILA that’s on GCP is stored in the US-East-4 and US-West-1 GCP regions. If you are going to set up a GCP compute environment to work with this data, we recommend placing it in one of those regions.
All the data on LILA that’s on Azure is stored in the South Central US Azure region. If you are going to set up an Azure compute environment to work with this data, we recommend placing it in the South Central US region.
What format do you use for metadata on LILA?
We use different formats depending on the nature each data set, but for camera trap data (and we love camera trap data!), we have tried to convert all data to a common format. More on the details of this format in a second, but if you share data on LILA, we’re willing to do the work to get your data into this format, and post that along with your original metadata. This greatly facilitates letting new researchers work with your data.
We use the “COCO Camera Traps” .json format proposed by Beery et al., which is a refinement of the format used by the COCO data set, adding fields specific to camera trap data sets. The format is formally specified here.
Have the camera trap datasets been mapped to a common taxonomy?
Yes! More information here.
I want to confirm that my download wasn’t corrupted… do you publish file sizes and MD5s?
We sure do! See this page.