Skip to main content

Why merging in Discogs is broken

Sometimes duplicate entries get added to the Discogs database, for various reasons:
  • inexperienced users: it takes some time to understand the Discogs workflow. This happened to me as well when I was starting out and it is because the Discogs edit interface throws you into expert mode, instead of trying to guide you through the process (which I have written about before here and here).
  • stupid sellers: some sellers still don't understand that Discogs is a catalog where you simply pick a release (unlike for example eBay), so they add releases that are already in the catalog.
  • entry errors: sometimes errors are made, which makes it hard to find that an entry has already been added, for example when it isn't clear what the correct label is, or the artist, and so on and then someone adds the same release twice.
  • disagreement about when a release is a variation: people on Discogs frequently disagree about when a release is actually a different release, or simply a variation. For example, a different run out or pressing plant might mean a different release for one person and merely a variation for another person (my opinion: it depends on the release)
This means that there are plenty of duplicate releases in Discogs. To mitigate this there is also a mechanism to merge releases and to get rid of the duplicates and keep the database clean(er). Except: I don't think it actually works as intended.

You can merge two releases using the "merge release" functionality in the edit menu:

Merge functionality in the edit menu

You then have to pick the two releases to merge and say which one should be kept:

Example: merging two releases

People can then vote on whether or not the releases should be merged. If enough people vote (I don't know what the threshold here is) the release will be merged, the pictures of all releases (if any) will be moved to the release that was chosen to be kept (although the merged images will be disabled by default) and the other release is set to 'Draft'.

For most people this is the end of the process and this is where things go wrong. What I have seen is that releases that are merged don't have the same information: while sometimes a release that is merged into another only has a subset of information of the other release very often the releases have non-overlapping sets of information and these are not actively merged by Discogs (only images are), and that means that information has to be actively copied from one release to the other.

Releases with status 'Draft' are removed every once in a while (although I am not sure if this is done automatically) or can be removed by people from their drafts list, and then the data will be purged from the database. This means that unless data is moved it is actually lost and has to be reentered, or recovered from the database dumps.

One solution to this is to encourage people to actively merge the data from the two releases (unless it is obviously incorrect) using some default language ("Releases have been merged. Don't forget to copy any data that should be copied!" or something like that), or to point out that there are differences and where they are, so people can copy these (the risk being that wrong information is copied, but that is an entirely different problem).

One other thing that I really dislike is that it is impossible to undo a merge, except by readding the old release. That usually means a lot of work and people who previously added the release to their lists will have to find it and readd it. So, having a way to undo a merge (also using some sort of voting system?) for a couple of days after a merge might not be a bad idea...

Comments

Popular posts from this blog

SID codes (part 1)

One thing that I only learned about after using Discogs is the so called Source Identification Code, or SID. These codes were introduced in 1994 to combat piracy and to find out on which machines a CD was made. It was introduced by Philips and adopted by IFPI, and specifications are publicly available which clearly describe the two available SID codes (mastering SID code and mould SID code). Since quite a few months Discogs has two fields available in the " Barcode and Other Identifiers " (BaOI) section: Mould SID code Mastering SID code A few questions immediately popped up in my mind: how many releases don't have a SID field defined when there should be (for example, the free text field indicates it is a SID field)? how many releases have a SID field with values that should not be in the SID field? how many release have a SID field, but a wrong year (as SID codes were only introduced in 1994) how many vinyl releases have a SID code defined (which is impossi

SPARS codes (part 1)

Let's talk about SPARS codes used on CDs (or CD-like formats). You have most likely seen it used, but maybe don't know its name. The SPARS code is a three letter code indicating if recording, mixing and mastering were analogue or digital. For example they could look like the ones below. There is not a fixed format, so there are other variants as well. Personally I am not paying too much attention to these codes (I simply do not care), but in the classical music world if something was labeled as DDD (so everything digital) companies could ask premium prices. That makes it interesting information to mine and unlock, which is something that Discogs does not allow people to do when searching (yet!) even though it could be a helpful filter. I wanted to see if it can be used as an identifier to tell releases apart (are there similar releases where the only difference is the SPARS code?). SPARS code in Discogs Since a few months SPARS is a separate field in the Discogs

Country statistics (part 2)

One thing I wondered about: for how many releases is the country field changed? I looked at the two most recent data dumps (covering February and March 2019) and see where they differed. In total 5274 releases "moved". The top 20 moves are: unknown -> US: 454 Germany -> Europe: 319 UK & Europe -> Europe: 217 unknown -> UK: 178 UK -> Europe: 149 Netherlands -> Europe: 147 unknown -> Europe: 139 unknown -> Germany: 120 UK -> US: 118 Europe -> Germany: 84 US -> UK: 79 USA & Canada -> US: 76 US -> Canada: 65 unknown -> France: 64 UK -> UK & Europe: 62 UK & Europe -> UK: 51 France -> Europe: 51 Saudi Arabia -> United Arab Emirates: 49 US -> Europe: 46 unknown -> Japan: 45 When you think about it these all make sense (there was a big consolidation in Europe in the 1980s and releases for multiple countries were made in a single pressing plant) but there are also a few weird changes: