Skip to main content

Label Code (part 1)

One piece of information that you can find on European releases that you will typically not see on other releases is the so called "Label Code".

This code, handed out by GVL in Germany, uniquely identifies the label on which a record was released. Full background can be found on the German Wikipedia page about Label Codes.

Discogs has a field dedicated to these codes in its database as well, and a partial list of the label codes (new labels, plus mutations, starting 01-01-2017) is available from the GVL website, so that opens up quite a few possibilities to compare data from the database with the external list from GVL, as well as with the rest of the data in the dump. Fun!

Label Code structure

Label Codes are very simple: first the letters LC (usually uppercase, but likely lowercase characters are used here and there too), followed by 4 or 5 digits, and sometimes some delimiter (whitespace, hyphen and possibly others as well).

A few common examples are below but other variants might exist as well:



Label Code data in the Discogs datadump

The Label Code field is not a new field and has been around for some time, so it would be interesting to see how often it is used, and if it is used, if it is used correctly, so let's look at some data!

Currently for each of the Label Code fields it is checked if there four or five digits, optionally prefixed by LC or lc and a possible delimiter. The check for LC is optional because even though the guidelines say it should be included some people do not. The check also does not take any trailing data into account. This means that at the moment too many Label Code are regarded as valid, but even then it turns out that the problem with Label Code values is quite massive.

In the September 2017 data dump of Discogs there are 582,168 Label Codes, distributed over 565,061 unique releases (as some releases have multiple Label Codes). Of these 39,494 in 28,632 unique releases do not conform to the Label Code syntax. This is around 6.78% of the Label Codes used.
At least 6.78% of the Label Code values in Discogs is wrong.
Even though percentage wise it might sound relatively small, it is still a big number. The reason why it goes wrong is fairly simple: people do not understand the Label Code field, because they did not read the guidelines, even though the guidelines are very clear about it! Probably they thought "it is a code, and it is on the label, so it must go into Label Code." which is incorrect.

Then there are also Label Code values that are in the wrong field. For example there are 1,302 that can be found in a Barcode field, and 1,192 in Rights Society fields, although a recent cleanup campaign should have fixed the latter problem.

Then there are 16,060 valid Label Code values that can be found in other fields (most of the time in a field named Other).

To reproduce these findings you can use my script to find smells.

Other possible Label Code checks

The checks that I have implemented so far are purely for syntax and answers the question "could the value in the fields be a correct label code?" but it doesn't actually check the value of the label code to see if it is a valid code, or if it is the right code.

Another check that could be done for some releases is to see if the release dates make sense: the label codes were introduced somewhere in the 1970s, so any release that claims to have a label code and is from before the introduction of label codes either does not have a label code, or is from a later date.

I will try to answer these questions in future posts.

Comments

Popular posts from this blog

SID codes (part 1)

One thing that I only learned about after using Discogs is the so called Source Identification Code, or SID. These codes were introduced in 1994 to combat piracy and to find out on which machines a CD was made. It was introduced by Philips and adopted by IFPI, and specifications are publicly available which clearly describe the two available SID codes (mastering SID code and mould SID code). Since quite a few months Discogs has two fields available in the " Barcode and Other Identifiers " (BaOI) section: Mould SID code Mastering SID code A few questions immediately popped up in my mind: how many releases don't have a SID field defined when there should be (for example, the free text field indicates it is a SID field)? how many releases have a SID field with values that should not be in the SID field? how many release have a SID field, but a wrong year (as SID codes were only introduced in 1994) how many vinyl releases have a SID code defined (which is impossi...

SPARS codes (part 1)

Let's talk about SPARS codes used on CDs (or CD-like formats). You have most likely seen it used, but maybe don't know its name. The SPARS code is a three letter code indicating if recording, mixing and mastering were analogue or digital. For example they could look like the ones below. There is not a fixed format, so there are other variants as well. Personally I am not paying too much attention to these codes (I simply do not care), but in the classical music world if something was labeled as DDD (so everything digital) companies could ask premium prices. That makes it interesting information to mine and unlock, which is something that Discogs does not allow people to do when searching (yet!) even though it could be a helpful filter. I wanted to see if it can be used as an identifier to tell releases apart (are there similar releases where the only difference is the SPARS code?). SPARS code in Discogs Since a few months SPARS is a separate field in the Discogs ...

Country statistics (part 2)

One thing I wondered about: for how many releases is the country field changed? I looked at the two most recent data dumps (covering February and March 2019) and see where they differed. In total 5274 releases "moved". The top 20 moves are: unknown -> US: 454 Germany -> Europe: 319 UK & Europe -> Europe: 217 unknown -> UK: 178 UK -> Europe: 149 Netherlands -> Europe: 147 unknown -> Europe: 139 unknown -> Germany: 120 UK -> US: 118 Europe -> Germany: 84 US -> UK: 79 USA & Canada -> US: 76 US -> Canada: 65 unknown -> France: 64 UK -> UK & Europe: 62 UK & Europe -> UK: 51 France -> Europe: 51 Saudi Arabia -> United Arab Emirates: 49 US -> Europe: 46 unknown -> Japan: 45 When you think about it these all make sense (there was a big consolidation in Europe in the 1980s and releases for multiple countries were made in a single pressing plant) but there are also a few weird changes:...