MegaDetector results for camera trap datasets

I was planning to run MegaDetector on some of these datasets… any chance you’ve already done this?

Yes! We’ve run MegaDetector versions 5a and 5b on every camera trap dataset on LILA, except for Snapshot Serengeti, for which we post results from MDv4, because of this issue that caused odd results for Snapshot Serengeti data for MDv5. For the same reason, the results for the “Snapshot Safari 2024 Expansion” dataset exclude Snapshot Serengeti results. After this issue is resolved, we’ll update results for the datasets that are impacted.

These results are intended to support classifier training or detector fine-tuning. There are two reasons we don’t recommend using these results to evaluate MegaDetector’s accuracy:

  • Most of these datasets have been used as part of MegaDetector’s training data, so although they are representative of the gist of MegaDetector’s performance in various ecosystems, they may provide a biased via of MegaDetector’s accuracy. You can see the list of specific datasets used to train MegaDetector here.
  • Because some of these results files are very large, we have thresholded them in some cases (albeit at levels much lower than anyone would care about for most applications).

What are the different results links for each dataset?

For each dataset, you’ll see links to results for both MegaDetector v5a and MegaDetector v5b (or MDv4 for Snapshot Serengeti, as per above). You’ll also see a set of results called “MDv5a with RDE”. These represent a version of the results where we’ve spent about five minutes per million images on the semi-manual repeat detection elimination process, which gets rid of a lot of false positives that are repeated over and over. There is a very small risk of getting rid of some true positives in this process as well, but the number of animals lost to this process should be negligible. So, all other things being equal, for almost everything you might want to do with the results on this page, use the RDE results. RDE results are not included for some datasets because those datasets don’t have location IDs that are required for the RDE process.

The results files contain relative paths… what are they relative to?

This .csv file contains metadata for every camera trap dataset on LILA; the “image_base_url” column has the Azure base URL to which all relative filenames can be concatenated to get a meaningful URL.

Clicking on this page like 20 times seems like an inefficient way to access these links… are they available in a structured format?

Yes! The following columns in the .csv file mentioned above contain URLs for MegaDetector results for each dataset:

  • mdv4_results_raw
  • mdv5a_results_raw
  • mdv5b_results_raw
  • md_results_with_rde

MDv5 columns will be empty for Snapshot Serengeti, MDv4 columns will be empty for other datasets. But since you want the RDE results anyway, use the “md_results_with_rde” column. But remember that MegaDetector v4 has different confidence values than other datasets; 0.8 is a good threshold for MDv4 results, 0.2 is a good threshold for MDv5 results.

They are relative to the same base folder as the .json metadata for each dataset. But we’re generally encouraging folks to ignore the individual dataset metadata going forward, and treat all the camera trap data on LILA as one big dataset (more information here about common taxonomy mapping and a unified data table). These MegaDetector results break the rules just a little, because we provide results for each dataset, but only because otherwise the results would be mega-massively-enormous.

Enough chit-chat, can you just give me a list of links to MegaDetector results?

Can do!

If anyone is curious about the MDv5 Snapshot Serengeti results, we have run MDv5a and MDv5a at both 1280px and 640px resolution (which will make sense if you read the issue description); email us if you need those results.

Also see…

this page, about mapping all of the LILA camera trap data to a common taxonomy, and ideally eliminating the lines between datasets and treating all the camera trap data on LILA as one big dataset.