One piece of information that you can find on European releases that you will typically not see on other releases is the so called "Label Code".
This code, handed out by GVL in Germany, uniquely identifies the label on which a record was released. Full background can be found on the German Wikipedia page about Label Codes.
Discogs has a field dedicated to these codes in its database as well, and a partial list of the label codes (new labels, plus mutations, starting 01-01-2017) is available from the GVL website, so that opens up quite a few possibilities to compare data from the database with the external list from GVL, as well as with the rest of the data in the dump. Fun!
A few common examples are below but other variants might exist as well:
Currently for each of the Label Code fields it is checked if there four or five digits, optionally prefixed by LC or lc and a possible delimiter. The check for LC is optional because even though the guidelines say it should be included some people do not. The check also does not take any trailing data into account. This means that at the moment too many Label Code are regarded as valid, but even then it turns out that the problem with Label Code values is quite massive.
In the September 2017 data dump of Discogs there are 582,168 Label Codes, distributed over 565,061 unique releases (as some releases have multiple Label Codes). Of these 39,494 in 28,632 unique releases do not conform to the Label Code syntax. This is around 6.78% of the Label Codes used.
Then there are also Label Code values that are in the wrong field. For example there are 1,302 that can be found in a Barcode field, and 1,192 in Rights Society fields, although a recent cleanup campaign should have fixed the latter problem.
Then there are 16,060 valid Label Code values that can be found in other fields (most of the time in a field named Other).
To reproduce these findings you can use my script to find smells.
Another check that could be done for some releases is to see if the release dates make sense: the label codes were introduced somewhere in the 1970s, so any release that claims to have a label code and is from before the introduction of label codes either does not have a label code, or is from a later date.
I will try to answer these questions in future posts.
This code, handed out by GVL in Germany, uniquely identifies the label on which a record was released. Full background can be found on the German Wikipedia page about Label Codes.
Discogs has a field dedicated to these codes in its database as well, and a partial list of the label codes (new labels, plus mutations, starting 01-01-2017) is available from the GVL website, so that opens up quite a few possibilities to compare data from the database with the external list from GVL, as well as with the rest of the data in the dump. Fun!
Label Code structure
Label Codes are very simple: first the letters LC (usually uppercase, but likely lowercase characters are used here and there too), followed by 4 or 5 digits, and sometimes some delimiter (whitespace, hyphen and possibly others as well).A few common examples are below but other variants might exist as well:
Label Code data in the Discogs datadump
The Label Code field is not a new field and has been around for some time, so it would be interesting to see how often it is used, and if it is used, if it is used correctly, so let's look at some data!Currently for each of the Label Code fields it is checked if there four or five digits, optionally prefixed by LC or lc and a possible delimiter. The check for LC is optional because even though the guidelines say it should be included some people do not. The check also does not take any trailing data into account. This means that at the moment too many Label Code are regarded as valid, but even then it turns out that the problem with Label Code values is quite massive.
In the September 2017 data dump of Discogs there are 582,168 Label Codes, distributed over 565,061 unique releases (as some releases have multiple Label Codes). Of these 39,494 in 28,632 unique releases do not conform to the Label Code syntax. This is around 6.78% of the Label Codes used.
At least 6.78% of the Label Code values in Discogs is wrong.Even though percentage wise it might sound relatively small, it is still a big number. The reason why it goes wrong is fairly simple: people do not understand the Label Code field, because they did not read the guidelines, even though the guidelines are very clear about it! Probably they thought "it is a code, and it is on the label, so it must go into Label Code." which is incorrect.
Then there are also Label Code values that are in the wrong field. For example there are 1,302 that can be found in a Barcode field, and 1,192 in Rights Society fields, although a recent cleanup campaign should have fixed the latter problem.
Then there are 16,060 valid Label Code values that can be found in other fields (most of the time in a field named Other).
To reproduce these findings you can use my script to find smells.
Other possible Label Code checks
The checks that I have implemented so far are purely for syntax and answers the question "could the value in the fields be a correct label code?" but it doesn't actually check the value of the label code to see if it is a valid code, or if it is the right code.Another check that could be done for some releases is to see if the release dates make sense: the label codes were introduced somewhere in the 1970s, so any release that claims to have a label code and is from before the introduction of label codes either does not have a label code, or is from a later date.
I will try to answer these questions in future posts.
Comments
Post a Comment