+29
Qualified

Duplicates detection and handling

Maja Pejčić 1 year ago in Digital Asset Management updated by Mayuri Sethiya (Home Depot) 5 months ago 7

It would be great if DAM would have support for handling duplicates in the system upon ingestion. Currently, when an asset gets uploaded via UI, DAM can detect if the file is duplicate of an asset that already exists in the system, however user can disregard the message about duplicate and still publish the asset. 

It would be good to give system administrators in DAM opportunity to configure desired behavior when duplicate is detected, so that publishing of duplicates can be prevented if so configured. 

On a bigger scale, it would be good if this feature is applicable to all types of ingestion (including custom ingestion mechanisms, via REST API) so that ingestion can be prevented for duplicate assets if so configured by system administrators regardless of ingestion mechanism.   

Versioning Ingestion / Upload

Hi Petra,

This is valuable to us too.

We want to be able to configure the behaviour for when duplicates are detected and we prefer to reject them if the asset uploaded is the duplicate of an asset existing, but prefer to upload for when just the filename of the asset uploaded matches the filename of an asset existing in the DAM. So, our use case is:

  1. If the checksum of file uploading = checksum of an asset in the DAM, REJECT with a desired message
  2. If the filename of file uploading = filename of an asset in the DAM, UPLOAD with a desired message
  3. If filename AND checksum of file uploading = filename AND checksum of an asset in the DAM, REJECT with a desired message

We would like this behaviour to be exerted both through the UI and the custom bulk uploading mechanism we’ve built through the Aprimo DAM’s REST API. 

This would be valuable for us as well.

Our use case. We'd like to set up rules that would validate an asset against others based on configurable metadata fields and upload them into pending with a duplicate flag.

Example

If checksum is the same upload but flag as duplicate.

If filename is the same upload but flag as duplicate.

If field 'a' and field 'b' are both matches flag as duplicate (can add up to x number of fields to match)

Currently it looks like

both Checksum and Filename need to match to flag as a duplicate.

Hi Ethan,

Could you give us some more insights on the use case why you would do matching fields to identify a duplicate? What type of content items is this being used for?

For the most part we'd need duplicate detection for image files.

It would be ideal if the system recognized duplicates by comparing pixels but we were told this wasn't something that was possible.

We often have photos that have been cropped and when they're cropped they're renamed and the checksum changes. We're looking for creative ways to use other metadata fields that may have not changed to 'flag' items as potential duplicates.

Does this make sense?

Agreed, an ideal option would be allowing the admin to associate the existing record (duplicate) to an additional classification.