Accessing images outside of giant zipfiles

Table of contents

  1. What’s wrong with giant zipfiles?
  2. Downloading a whole data set (without the giant zipfiles)
  3. Downloading a folder from a data set
  4. Downloading a list of files from a data set

What’s wrong with giant zipfiles?

Many of our data sets are posted as big zipfiles, which are convenient in that they can be downloaded in your browser with no special tools. However, there are some major issues with giant zipfiles… most notably, you have to unzip them. This can take almost as long as downloading in some cases, and it means you have to have twice as much storage available as the data set really requires. Furthermore, if you only want part of a data set, e.g. one folder, or one species, you would still have to download and unzip the whole zipfile.

So we’re also providing unzipped copies of many of our data sets, to facilitate simpler or smaller downloads.

This page will give you a few ways to download images without dealing with giant zipfiles. For bulk downloads of folders or data sets, we will recommend AzCopy, a command-line tool for downloading files from Azure storage, which works on Linux/Windows/Mac. We will provide examples of using AzCopy to download whole data sets or single folders. We recommend using AzCopy even if you do want to download zipfiles.

These approaches will depend on having a SAS (Shared Access Signature) URL for the storage container associated with the data set you want to access. A SAS URL is like a password for an Azure storage container, in this case a password giving you read-only access to the container. We have posted a list of SAS URLs for all the containers that have unzipped images here, and we’ll refer back to that list later.

Downloading a whole data set (without the giant zipfiles)

Let’s experiment with the Missouri Camera Traps data set. If I open the list of SAS URLs, I’ll see that the SAS URL for this dataset is:

https://lilablobssc.blob.core.windows.net/missouricameratraps/images?st=2020-01-01T00%3A00%3A00Z&se=2034-01-01T00%3A00%3A00Z&sp=rl&sv=2019-07-07&sr=c&sig=zf5Vb3BmlGgBKBM1ZtAZsEd1vZvD6EbN%2BNDzWddJsUI%3D

To download the entire data set to the folder c:\blah, I can do this:

azcopy cp "https://lilablobssc.blob.core.windows.net/missouricameratraps/images?st=2020-01-01T00%3A00%3A00Z&se=2034-01-01T00%3A00%3A00Z&sp=rl&sv=2019-07-07&sr=c&sig=zf5Vb3BmlGgBKBM1ZtAZsEd1vZvD6EbN%2BNDzWddJsUI%3D" "c:\blah" --recursive

Downloading just one folder from a data set

If I look at the metadata file for this data set, I see that there’s a folder called Set1/1.02-Agouti/SEQ75520 containing just one sequence of camera trap images. What if I want to download just that one folder to c:\blah? I can just stick that folder name into the SAS URL for this dataset, like this:

azcopy cp "https://lilablobssc.blob.core.windows.net/missouricameratraps/images/Set1/1.02-Agouti/SEQ75520?st=2020-01-01T00%3A00%3A00Z&se=2034-01-01T00%3A00%3A00Z&sp=rl&sv=2019-07-07&sr=c&sig=zf5Vb3BmlGgBKBM1ZtAZsEd1vZvD6EbN%2BNDzWddJsUI%3D" "c:\blah" --recursive

Downloading a list of files from a data set

If you want to download, e.g., all the images for a particular species from a data set, this is supported too, but it requires a little code. We have an example of how to do this here.