Skip to main content

Posts

Showing posts from September, 2017

Discogs gamification

At Discogs every edit you make scores points: adding a new entry is 3 points and editing an entry, or adding one or more pictures is worth 1 point. Users are ranked according to the number of points they have. Some users have more than 100,000 points with a very select few significantly more than that . Scoring points is a good example of gamification , where you try to get users to do perform a certain action and reward them for it. In this case it is about adding new releases or improving existing releases and the reward is points. To get more points can make some people feel good: you get immediate feedback after making a commit (more points) and you can see that you are making progress to a higher ranking. This positive feedback loop has been well researched. What Discogs could do more...and should they? At the moment Discogs is not really promoting the contributor ranking. The question is whether or not this should be changed. On the upside it could mean that people mig

Finding differences between monthly Discogs data dumps (part 1)

One cool thing about Discogs is that you can see the whole history of the data, including who edited when and what was changed. For a software engineer like me that is very convenient, as I am used to working with version control. Unfortunately this data is missing from the Discogs datadump, so I can not easily see which data has changed compared to a previous month. This makes it a bit more difficult to for example see if things are being changed incorrectly, or spot other problems. Having a field with a timestamp of the last change to the release would be great to have. I suggested it to Discogs, but I am not sure if they will add it to the XML, or maybe they will save it for a next update of the format. In any case, I wanted to be able to see the differences between months now , so I came up with an awful hack, but one that would also make it possible for me to do some very crude filtering, as well as find differences between any two months, and not just consecutive months, at t

Styrene versus vinyl (part 1)

One thing that I learned when I was looking through releases on Discogs is that not everything that I thought is vinyl is actually vinyl. Many 7"s (and some LPs) in the US were made from styrene instead of vinyl and made with a different technique, namely injection molding. Apparently it is cheaper to produce than vinyl and singles were basically seen as a throwaway product, so the fact that the quality wasn't as good as vinyl didn't matter. My guess is that by shaving off even a tiny bit of the costs per unit big savings could be made. Depeche Mode said it: everything counts in large amounts. I never encountered styrene records before because in my collection there are very few 7"s from the US, but recently a friend bumped into a few. The differences are obvious: you just feel that something isn't quite the same. The disc feels brittle and it isn't as flexible as vinyl. When you tap a styrene disc with your finger it also makes a different sound than when y

Generating QR labels from Discogs collection CSV exports (1)

If you are into collecting records it has probably happened to you: you are at a record fair and wonder whether or not you already have the record you are holding in your hands and you need to decide whether to buy it or not on the spot. It certainly has happened to me: I have bought plenty of duplicates, or was absolutely certain that I had a certain record, only to find out at home that I actually didn't have it. I sometimes also go through my record collection and see records that look very similar (to illustrate that: I have about 20 different copies of Queen's Greatest Hits) and then wonder what the difference between them is, as I tend to forget the subtle differences. This information is (mostly) recorded in online databases such as Discogs, but I am not always near my computer (strange, but true). So I was thinking: what can I do to connect these two worlds? One answer I came up with: make labels with QR codes that point to the right release on Discogs and use my sm

Label Code (part 1)

One piece of information that you can find on European releases that you will typically not see on other releases is the so called "Label Code". This code, handed out by GVL in Germany, uniquely identifies the label on which a record was released. Full background can be found on the German Wikipedia page about Label Codes . Discogs has a field dedicated to these codes in its database as well, and a partial list of the label codes (new labels, plus mutations, starting 01-01-2017) is available from the GVL website, so that opens up quite a few possibilities to compare data from the database with the external list from GVL, as well as with the rest of the data in the dump. Fun! Label Code structure Label Codes are very simple: first the letters LC (usually uppercase, but likely lowercase characters are used here and there too), followed by 4 or 5 digits, and sometimes some delimiter (whitespace, hyphen and possibly others as well). A few common examples are below but

Digging through the Discogs XML

At the beginning of every month (usually the 3rd or 4th day of the month) Discogs releases a data dump of various parts of the database: individual releases master releases artists labels This data is released under the CC0 license , basically allowing unlimited reuse, which is very cool and opens up all kinds of possibilities, which got me really excited about the possibilities (and caused a sleepless night or two). I have not fully looked into all the data and in this post (and the next few) I will only look at the XML of the individual releases. Individual releases in the Discogs data Every month the data for all individual releases of Discogs is made available as a gzip compressed XML dump. This file is quite big: the dump for September 2017 is 4.9 GiB when it is gzip compressed. Uncompressed it is 32 GiB. The data dump for September 2017 contains information about 8,878,391 releases The Discogs database had information about  8,878,391 releases on September 1 2017

Why I love Discogs and why I hate Discogs

Since a few years I have been using Discogs and I love it and hate it at the same time. For those of you who do not know: Discogs is an online catalogue trying to collect as much information about any music-related release ever made. It works as follows: people enter information about releases into the database and collaborate on enhancing the data, fixing errors, uploading pictures and so on until the information is complete and correct. Data quality: or why I hate Discogs At least, that's the theory. In practice it turns out that Cory Doctorow was right : the data quality for the releases in the catalogue varies a lot. For some releases every tiny little detail has been described and there are clear pictures, for other releases you get a catalogue number and possibly the right country (if you're lucky!). Some of this bad data has been in there for many years and no one is fixing it, even though for some releases over 100 people have indicated they have it. This is what I