Table of contents
- Having trouble downloading?
- How can I download just part of a data set, and/or download an already-unzipped dataset?
- Where is all the data stored?
- What data formats do you use on LILA?
- Have the camera trap datasets been mapped to a common taxonomy?
- Do you provide MD5 values for LILA files?
I’m having trouble downloading large files from LILA; my browser eventually gives up. Help!
There are three copies of (almost) every dataset on LILA; specifically, there are copies on Google Cloud Platform (GCP) Cloud Storage, Amazon Web Services (AWS) S3, and Microsoft Azure Blob Storage. If you’re having issues when downloading files in your browser, we recommend trying the command-line tools that each cloud provides:
All copies are the same, so you can use whichever cloud/tool is most convenient for you. But using any of the command-line tools will be faster and more reliable than downloading through your browser. Additionally, CyberDuck is a graphical tool that can list and download data from all three cloud providers, though full instructions for CyberDuck are beyond the scope of this FAQ.
Here are quick instructions for downloading files with gsutil, aws s3, azcopy…
Downloading a file with gsutil
If the URL you’re trying to download is:
https://storage.googleapis.com/public-datasets-lila/mydataset/myfile.zip
…use the following gsutil command line to copy it to your current directory:
gsutil cp "gs://public-datasets-lila/mydataset/myfile.zip" "./myfile.zip"
Note that we replaced “https://storage.googleapis.com/” with “gs://”.
The URL and the destination path should be surrounded in double-quotes.
Downloading a file with aws s3 cp
If the URL you’re trying to download is:
http://us-west-2.opendata.source.coop.s3.amazonaws.com/agentmorris/lila-wildlife/mydataset/myfile.zip
…use the following gsutil command line to copy it to your current directory:
aws s3 cp "s3://us-west-2.opendata.source.coop/agentmorris/lila-wildlife/mydataset/myfile.zip" "./myfile.zip" --no-sign-request
Note that we replaced “http://us-west-2.opendata.source.coop.s3.amazonaws.com/” with “s3://us-west-2.opendata.source.coop/”.
The URL and the destination path should be surrounded in double-quotes.
Downloading a file with AzCopy
If the URL you’re trying to download is:
https://lilablobssc.blob.core.windows.net/mydataset/myfile.zip
…use the following AzCopy command line to copy it to your current directory:
azcopy cp "https://lilablobssc.blob.core.windows.net/mydataset/myfile.zip" "./myfile.zip"
The URL and the destination path should be surrounded in double-quotes.
How can I download just part of a data set, and/or download an already-unzipped dataset?
For tips on downloading part of a data set (e.g. a folder or a species) and/or eliminating the need to unzip a giant zipfile after downloading, check out our guide on directly accessing images without using giant zipfiles.
I want to set up a GCP, AWS, or Azure VM near LILA data. Where is all the data stored?
- On GCP, LILA data is stored in the US-East-4 and US-West-1 GCP regions. If you are going to set up a GCP compute environment to work with this data, we recommend placing it in one of those regions.
- On AWS, LILA data is stored in the us-west-2 AWS region. If you are going to set up an AWS compute environment to work with this data, we recommend placing it in the us-west-2 region.
- On Azure, LILA data is stored in the South Central US Azure region. If you are going to set up an Azure compute environment to work with this data, we recommend placing it in the South Central US region.
What format do you use for metadata on LILA?
We use different formats depending on the nature each data set, but for camera trap data (and we love camera trap data!), we have tried to convert all data to a common format. More on the details of this format in a second, but if you share data on LILA, we’re willing to do the work to get your data into this format, and post that along with your original metadata. This greatly facilitates letting new researchers work with your data.
We use the “COCO Camera Traps” .json format proposed by Beery et al., which is a refinement of the format used by the COCO data set, adding fields specific to camera trap data sets. The format is formally specified here.
Have the camera trap datasets been mapped to a common taxonomy?
Yes! More information here.
I want to confirm that my download wasn’t corrupted… do you publish file sizes and MD5s?
We sure do! See this page.