Skip to main content

Posts

Showing posts from March, 2018

Where the current Discogs datamodel doesn't work (part 2)

Time to look into a few more aspects of the Discogs data model and where I think it could be improved. If you haven't read the first part I would suggest you do that first. This time I want to look at a few more: rights society, mastering and mould SID codes, and pressing plant. Rights Society Currently rights societies are stored in the "Barcodes and Other Identifiers" (BaOI) section which covers the whole release. The correct place for this information would actually be per track. There are some releases in Discogs (like some compilations) where it is indicated which rights society applies to which track. Mastering and Mould SID codes Like rights societies the mastering and mould SID codes are stored in the BaOI section that covers the whole release. The correct place for these identifiers would actually be a physical disc, as they frequently are different in releases with multiple discs per release. The solution that people use is to tag it in the free text fie

Release with artists not in the Discogs database (part 1)

When adding a new artist to Discog and the artist is unknown it is normally added to the database and the release is added to the artist page. Artist actually is a bit of a misnomer, as it could also mean a sound engineer, a producer, and so on. For each of the "artists" there is a number in the database and a corresponding artist page. I was looking through some of the entries in Discogs and spotted a few instances where the number of the artist was 0 and on the release page there was no link to an artist page. The credits list on the Discogs website explains that there are five credits for which the artist is not recorded in the database and there is no artist page. Personally I don't see the need for three of these ("artwork by", "photography" and "executive producer") and think they should be completely replaced by linked credits. I was wondering how many releases there actually are where one or more "artists" were not li

Where the current Discogs datamodel doesn't work (part 1)

After looking at many releases in Discogs and the edit history of various releases and how things have changed through time it is quite clear to see how the data model has changed. Some of these changes have turned out well, while others were, in retrospect, probably not the right change to make. I am not blaming the developers as I know from experience how difficult it is to get a data model right the first time and how hard it is to change: as soon as something is in use it is non-trivial to change especially with a database the size of Discogs. Still I want to go through a few examples where I think the current datamodel doesn't work. Most of my examples will focus on the fields from the "Barcodes and Other Identifiers" (BaOI) section. In this post I am looking at three of them: ISRC, SPARS codes and matrix/runout. ISRC Currently ISRC codes (International Standard Recording Code, see previous articles about it for more information: 1 , 2 , 3 ) are stored in the

Homoglyph confusion in the Discogs database (part 2)

I decided to look into character sets again, because the last time I only looked at a few instances of "homoglyphs" (characters that look like characters in other languages, but which are different). It is recommended to first read the first article about this. I looked into rights societies again. Last time I mostly focused on Greek characters, this time I looked at where Latin characters were used where Cyrillic characters should have been used. The rights society in Russia is called RAO which in Russian is spelled РАО, which looks a lot like PAO, but it is different (first one is using the Cyrillic alphabet, second one the Latin alphabet). They are in the data as follows: Distribution of wrong entries for Russian rights society РАО What is interesting is the peak for recent releases. I don't know whether this is because more Russian releases have been added recently, or if this error has mostly been corrected for earlier releases.

What happened in Discogs in February 2018?

It is a new month, so that means that there is a new dumpfile available that I can do analysis on. If this is the first time you see one of these posts I would highly recommend to first read a similar post from last month . Release statistics The latest dump ("the March dump") contains data from February 1 - February 28 (inclusive).  The previous dump had 9,442,719 releases, the new dump has  9,554,069 releases. That means 111,350 releases more in the database. 3,391,491 releases stayed the same 6,048,861 releases were changed 113,717 releases were added 2,367 releases were removed from the database 209 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 1 release was moved from Draft to Accepted So, looking at the releases that were changed it is obvious: Discogs changed the dump format again! ARGH! This time it is in the "joiners" for the artist names. Discogs changing the internal XML format makes it

Using matrix numbers of CDs to verify release years (2)

I am still not done looking at matrix numbers to verify Discogs release data and I am uncovering more and more. In a previous post I wrote about how Cinram has embedded the manufacturing date of the glass master in the matrix and how effective it can be to find releases with an incorrect release year. But there are also other companies that did this, such as P+O . In typical German fashion they were very thorough with their matrices, and until 2007 they put a few markers possibly indicating the date in the matrix (later they used a different matrix and since then it is not so easy). Each old style P+O matrix has: a number identifying the release a character indicating the master (for example A, or B) an optional number indicating the stamper a month/year combination indicating the production date The two interesting parts are the number identifying the release and the month/year. For each of the numbers it is known which number was used in which year. The month/year combina