Skip to main content

Digging into the Spanish Depósito Legal identifier (part 3)

Time again to dive a bit further into the Spanish depósito legal identifiers to date releases.

It is highly recommended to read part 1 and part 2 first before reading this article.

Using depósito legal to date a release

The year that is embedded in the depósito legal can be used to date a release, but there are a few things to keep in mind. For that it is necessary to know a little bit more about the depósito legal.

To get a depósito legal number on a release you first have to apply for it. The release is then assigned a number by the library. This number is then to be printed on the release. This means that the release could never have been prior to the year embedded in the depósito legal number, unless the depósito legal number has been misprinted.

However, the depósito legal number could be in the past: maybe a depósito legal was applied for, and then the release was postponed. Or, it was applied for in December and the release was made in January the next year. It is also common to reuse a depósito legal number for a reissue.

In a this article I will look at how many releases exist where the release date recorded in the database is earlier than the year in the depósito legal, which should not be possible.

I took the data dump from October 2017 and looked at just the releases from Spain (ignoring the releases with a valid Spanish Depósito Legal field where the country was set to something else) and grabbed both the year and the value for depósito legal to see how many releases had a release year earlier than indicated in the Depósito Legal field.

First some statistics: in the data dump I looked at there are 31641 releases with a Depósito Legal field declared. There are about the same amount of releases with known Depósito Legal related smells, which are slowly being fixed.
On October 31 2017 there were about as many releases with a Depósito Legal field as there were releases that had a known Depósito Legal smell.
I updated my scripts to check Discogs data dumps and ran some tests for a few smells:
  • invalid years in the Depósito Legal field (years < 1900, or > 2017)
  • release date (in the Released field) earlier than the year in the Depósito Legal field, indicating an error in either the Depósito Legal field or the Released field
  • incomplete Depósito Legal field values that have no year component
Around 460 releases where the release year was earlier than the depósito legal number (which is impossible, as a depósito legal value can only be printed on the record after it has been obtained) were found. I had a friend look into these and he focused mainly on releases where the difference between the two values was more than 1 year. He found a mixture of several errors:
  • keyboard errors (people hitting the wrong key)
  • copy/paste errors from earlier releases that were used as a template
  • wrong dates for either the release
  • wrong depósito legal numbers
  • depósito legal numbers that were incomplete (and the match would be on the middle part of the number, which is not the year and should be fixed in the cleanup scripts)
Around 80 releases had an "impossible" depósito legal year component:
  • wrong format (hyphens missing)
  • digits missing
  • extra digits
  • keyboard errors (placing some releases in the 17th century or earlier)
It was interesting to see that some people used a regular hyphen (from the ASCII character set), while others used a so called Unicode "en dash", possibly forced on them by their keyboard layout, or operating system, or perhaps it was personal preference. They might look very similar on screen, but data wise they are not! This is definitely something to put on my list of things to check.
Some people use a Unicode 'en dash' instead of an ASCII hyphen as a separator.
The whole list of releases that are incorrect will soon be shared with the wider Discogs community, so they can be fixed. I am expecting that with ongoing fixing of releases that more errors will pop up.

Next time: checking whether or not represses and reissues that have not been tagged as such can also easily be found.

Comments

Popular posts from this blog

SID codes (part 1)

One thing that I only learned about after using Discogs is the so called Source Identification Code, or SID. These codes were introduced in 1994 to combat piracy and to find out on which machines a CD was made. It was introduced by Philips and adopted by IFPI, and specifications are publicly available which clearly describe the two available SID codes (mastering SID code and mould SID code). Since quite a few months Discogs has two fields available in the " Barcode and Other Identifiers " (BaOI) section: Mould SID code Mastering SID code A few questions immediately popped up in my mind: how many releases don't have a SID field defined when there should be (for example, the free text field indicates it is a SID field)? how many releases have a SID field with values that should not be in the SID field? how many release have a SID field, but a wrong year (as SID codes were only introduced in 1994) how many vinyl releases have a SID code defined (which is impossi

SPARS codes (part 1)

Let's talk about SPARS codes used on CDs (or CD-like formats). You have most likely seen it used, but maybe don't know its name. The SPARS code is a three letter code indicating if recording, mixing and mastering were analogue or digital. For example they could look like the ones below. There is not a fixed format, so there are other variants as well. Personally I am not paying too much attention to these codes (I simply do not care), but in the classical music world if something was labeled as DDD (so everything digital) companies could ask premium prices. That makes it interesting information to mine and unlock, which is something that Discogs does not allow people to do when searching (yet!) even though it could be a helpful filter. I wanted to see if it can be used as an identifier to tell releases apart (are there similar releases where the only difference is the SPARS code?). SPARS code in Discogs Since a few months SPARS is a separate field in the Discogs

Country statistics (part 2)

One thing I wondered about: for how many releases is the country field changed? I looked at the two most recent data dumps (covering February and March 2019) and see where they differed. In total 5274 releases "moved". The top 20 moves are: unknown -> US: 454 Germany -> Europe: 319 UK & Europe -> Europe: 217 unknown -> UK: 178 UK -> Europe: 149 Netherlands -> Europe: 147 unknown -> Europe: 139 unknown -> Germany: 120 UK -> US: 118 Europe -> Germany: 84 US -> UK: 79 USA & Canada -> US: 76 US -> Canada: 65 unknown -> France: 64 UK -> UK & Europe: 62 UK & Europe -> UK: 51 France -> Europe: 51 Saudi Arabia -> United Arab Emirates: 49 US -> Europe: 46 unknown -> Japan: 45 When you think about it these all make sense (there was a big consolidation in Europe in the 1980s and releases for multiple countries were made in a single pressing plant) but there are also a few weird changes: