Skip to main content

Posts

Showing posts from February, 2018

Pressing plant misspellings in CD matrix fields

One thing that I learned about data entry: it is difficult! People are sloppy and tend to make silly mistakes and then overlook them, because the brain is great at error correction, making you blind for these mistakes. I bumped into one particular error in Discogs twice while looking at possible errors for specific pressing plants . And, I thought, if I can already see it twice by sampling a small section of releases, then it has to be more common. The error that I found is that the matrix numbers of the PMDC plants were misspelled as " PDMC " (interestingly, PDMC was another completely unrelated plant). Some likely reasons: sometimes the matrix is mirrored (tricky for the brain) copy and paste errors from elsewhere (personal) PDMC is easier to say than PMDC Using my scripts and the latest datadump I found 62 releases with this particular error. They are distributed as follows in Discogs: Releases where PMDC is likely misspelled as PDMC in the matrix I

Using pressing plant identifiers to date releases (2)

Using pressing plant information to find wrong information in releases in Discogs is something that I explored in an earlier post . I only looked at a single pressing plant and that already uncovered 76 releases that were obviously incorrect. I added a few more checks for pressing plants to my scripts and processed the latest Dicogs data dump that I could find. Immediately hundreds of more errors popped up, and I didn't even add that many companies to my checks. With the new checks I could find 610 that are wrong, but I am sure that as soon as I start adding more checks many more errors will pop up. Releases in Discogs where manufacturing plants and release years don't match What often seems to be the case that people have combined the original release with reissues, making the entries completely useless for a correct classification. One thing that I noticed when looking at the pages for the manufacturing plants is that there are quite a few where it is mentioned w

SPARS Codes (part 4)

In the last few weeks I have been digging a lot into CDs and I have come to the conclusion that correctly identifying CDs is difficult: there are so many things that you have to take into account and it is easy to make a mistake. Luckily automation helps! I decided to look at SPARS codes again, as I felt that story was not yet complete. If you don't know what SPARS codes are, I would suggest to first read my earlier posts about: part 1 , part 2 and part 3 . On Wikipedia it says that SPARS code were introduced in 1984 . Of course, we all know that if it is on Wikipedia it just has to be true! Wikipedia has a reference to a physical magazine from 1984 and with a bit of searching I found someone who actually describes the same magazine and which seems to confim what Wikipedia says. So, I wondered: how many releases are there in the Discogs database with a defined SPARS code field (containing a valid SPARS code) and a declared release date prior to 1984? I was surprised to f

Using pressing plant identifiers to date releases (1)

Something that has turned out surprisingly hard to do is to categorize CDs. While at first it seemed easier than vinyl it has turned out to be much more difficult, as CDs have been repressed in different years by different plants (or the same under a different name), but where everything is the same, except for some very minor details, such as SID codes, or the CD matrix. I have already written about SID codes and how they can be used to very roughly date CD releases . But there is also other information that can be used to see if releases have been dated (somewhat) correctly. Recently I decided to look at the companies that are listed, see when they were operational and use that information for checking release years. Quite a few CDs were pressed at a particular plant in the US. During its lifetime this plant operated under different names: PDO, USA from 1986 - 1992 PMDC, USA from 1992 - 1999 UML from 1999 - 2005 EDC, USA from 2005 - 2009 There is some overlap: some g

Using matrix numbers of CDs to verify release years (1)

One of the first steps when making CDs is to produce a "glass master". From that (using a few more steps) a "stamper" is created which is then used to press the actual CDs. When looking at a CD you can often see text in a ring in the middle of the CD. This is the so called matrix which comes from the glass master. Apart from the matrix other text (like the IFPI mastering SID code) could possibly also come from the glass master. A bit of background information can be found on page 7 of the IFPI SID code implementation guide although I would also highly recommend watching some of the clips about glass masters on YouTube, which are highly informative. One company making a lot of these glass masters is Cinram . For most of their glass masters they stored the production date in matrix. This is good news, because it means that it could possibly be used to verify releases and see if the declared year of the release is right: it could never have been released before

DMM records in Discogs

Somewhere in the 1980s records started to appear with the "DMM" (Direct Metal Mastering) method. The whole background about DMM is explained on Wikipedia much better than I could do, so I will just focus on DMM in Discogs. In Discogs there is no special field to indicate that a record was made using DMM, so some people have used the free text field in the "Format" section for it. I looked at how often this happened: 571 times, which are distributed like this in the Discogs database: Distribution of releases tagged DMM This seems to be quite low as there should be many many more (especially the 1980s pop records). Of course, one explanation could be that it is actually not relevant information. I am not aware of records that were both pressed as DMM and non-DMM releases and where the only difference is that DMM was used (but I could be wrong). One thing that I also looked at is how many release were in Discogs that have DMM in the free text field but whic

ISRC codes in Discogs (part 3)

Time to revisit the ISRC codes. I already talked about these twice, namely how many errors for these codes there are in Discogs , as well as how to extract them from a CD . In the first of those posts I already hinted at that there is actually a year component in the ISRC codes. The ISRC code for a single track has 12 characters. Characters 6 and 7 should be digits indicating the year the code was assigned (although in the early days some ISRC codes were handed out where the year the song was written or when the recording was made). It would of course been easier if they would have used 4 characters, but they didn't. That means that the year component from the ISRC code can be used to check whether or not a release is correctly dated, because a release cannot be from earlier than the year the ISRC was assigned. I adapted my scripts to check for a few things: is the format confirming to the ISRC standard? is the year recorded in the ISRC not later than the date of the rele

CD+G releases in Discogs

For many years to me an audio CD was just a silver disc with music on it. But as I already discovered with for example the ISRC codes on CDs there can be more data on an audio CD than most people know. One thing is that on some CDs there are also graphics that can be displayed on a system capable of displaying some rudimentary graphics, such as karaoke machines. The format is called CD+G (CD plus graphics) and it can be recognized (amongst others) by the compact disc logo with the word "graphics" included. So I wanted to know: how many CD+G releases are there in Discogs? Plus, how many have possibly gone undetected and can I detect them? In the format section there is a checkbox for CD+G, but some people have entered it into the free text field (which is incorrect), so I checked both. There are currently 352 releases in the database where the release either has the CD+G checkbox checked (350 releases) or where it is in the free text field (2 releases, but fixed now)

EMI catalog numbers and countries (1)

When I was a bit more fanatic about collecting records of a certain artist who were on EMI (and related labels) I quickly learned about how EMI's labels worked and it all seemed very clear: catalog numbers starting with 5C means The Netherlands, 1A is either The Netherlands or Europe (in the 1980s), 5A and 1C are Germany, 4C is Belgium, 2C France, and so on, like mentioned on this page in the Discogs reference site . So I thought that it would be trivial to cross reference EMI label numbers and countries in Discogs to see if they would match and clearly see if anyone would have entered the wrong data and then rightfully scold them for it. But reality turns out to be a bit more complicated than anticipated. What I did is that I looked at all the releases where the catalog number starts with either "5C" or "5c" and checked if the country is The Netherlands. I could find a bit over 6400 releases from the Netherlands: Distribution of Dutch releases with EMI

License numbers on Greek releases

One of the things that I care about in Discogs is correct data and finding something in the dataset that allows me to perform a few more checks to find errors always makes me bit excited. This time it is about "Greek license numbers": like Czech manufacturing dates or Indian packaging dates (some) Greek releases have a license number, where one part of the code is actually a year. Some people have added this license number to the "Barcode and Other Identifiers" (BaOI) section which means it can be used to verify if the year that was declared in the releases is actually correct! In total there are 1402 releases from Greece that have such a license number in BaOI and where it can be easily detected. They are distributed over the data as below: Distribution of Greek releases with a license number For this particular check I made the assumption that a release cannot be from before the release year that is on the release, but it can be from after (similar to

What happened in Discogs in January 2018?

A new month, so Discogs uploaded a new datadump that I could analyze. Before reading this blogpost it might be good to first read the one about last month's datadump . Release statistics The latest dump ("the February dump") contains data from January 1 - January 31 (inclusive).  The previous dump had 9,324,867 releases, the new dump has 9,442,719 releases. That means 117,852 releases more in the database which seems to be a bit more than in previous months. 8,776,440 releases stayed the same 545,810 releases were changed 120,469 releases were added 2,617 releases were removed from the database 179 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 1 release was moved from Draft to Accepted All in all it looks like a very typical month in Discogs. Smells When looking at absolute numbers of releases for which I have identified a possible smell there is good news: 1,236 releases with possible known smells les

What happened in Discogs in December 2017?

A bit later than usual (I was swamped with work), but there is a new datadump, so it is time to dive into statistics again. It might be good to read one of the previous posts before reading this one. Release statistics I downloaded the latest database dump (the "January dump") containing data from December 1 to December 31 (inclusive). The previous dump had 9,217,123 releases, the new dump has 9,324,867 releases. That means 107,744 releases more in the database, which seems to be quite consistent with the previous months. 3,916,654 releases stayed the same 5,298,409 releases were changed 109,804 were added 2,060 releases were removed from the database 128 releases had status Draft , Deleted or Rejected 11 releases that were not Accepted were in both the December and January dumps 2 releases were moved from Draft to Accepted So what is immediately obvious is that like with the December dump there is again a very large amount of changed releases and like last t