MegaDetector results for camera trap datasets

I was planning to run MegaDetector on some of these datasets… any chance you’ve already done this?

Yes! We’ve run MegaDetector on every camera trap dataset on LILA. These results are intended to support classifier training or detector fine-tuning. There are two reasons we don’t recommend using these results to evaluate MegaDetector’s accuracy:

  • Most of these datasets have been used as part of MegaDetector’s training data, so although they are representative of the gist of MegaDetector’s performance in various ecosystems, they may provide a biased via of MegaDetector’s accuracy. You can see the list of specific datasets used to train MegaDetector here.
  • Because some of these results files are very large, we have thresholded them in some cases (albeit at levels much lower than anyone would care about for most applications).

What are the different results links for each dataset?

For each dataset, you’ll see links to results to one or more sets of “raw” results (MDv4, MDv5a, MDv5b, and/or MDv1000-redwood). You’ll also see a set of results called “MDv[something] with RDE”. These represent a version of the results where we’ve spent about five minutes per million images on the semi-manual repeat detection elimination process, which gets rid of a lot of false positives that are repeated over and over. There is a very small risk of getting rid of some true positives in this process as well, but the number of animals lost to this process should be negligible. So, all other things being equal, for almost everything you might want to do with the results on this page, use the RDE results. RDE results are not included for some datasets because those datasets don’t have location IDs that are required for the RDE process.

The results files contain relative paths… what are they relative to?

This .csv file contains metadata for every camera trap dataset on LILA; the “image_base_url” column has the Azure base URL to which all relative filenames can be concatenated to get a meaningful URL.

Clicking on this page like 20 times seems like an inefficient way to access these links… are they available in a structured format?

Yes! The following columns in the .csv file mentioned above contain URLs for MegaDetector results for each dataset:

  • mdv4_results_raw
  • mdv5a_results_raw
  • mdv5b_results_raw
  • md1000-redwood_results_raw
  • md_results_with_rde

As per above, for almost everything anyone would want to do with these results, you should use the “md_results_with_rde” column; that’s the “best” set of results for each dataset. But you may need to know that the “best” Snapshot Serengeti results are derived from MDv4 results, and MDv4 has different confidence values than other datasets; 0.8 is a good threshold for MDv4 results, 0.2 is a good threshold for MDv5 and MD1000 results.

Image filenames in these results files are relative to the same base folder as the .json metadata for each dataset. But we’re generally encouraging folks to ignore the individual dataset metadata going forward, and treat all the camera trap data on LILA as one big dataset (more information here about common taxonomy mapping and a unified data table). These MegaDetector results break the rules just a little, because we provide results for each dataset, but only because otherwise the results would be mega-massively-enormous.

Enough chit-chat, can you just give me a list of links to MegaDetector results?

Can do!

If anyone is curious about the MDv5 Snapshot Serengeti results, we have run MDv5a and MDv5a at both 1280px and 640px resolution (which will make sense if you read the issue description); email us if you need those results.

Also see…

this page, about mapping all of the LILA camera trap data to a common taxonomy, and ideally eliminating the lines between datasets and treating all the camera trap data on LILA as one big dataset.