Skip to main content

Posts

Showing posts from April, 2018

Contributor ranking in Discogs (part 2)

What I like about working with the Discogs data is to make data that isn't visible visible. In an earlier post I talked about that I suspected that the Discogs contributor ranking likely followed the 80/20 rule , but I didn't have enough data yet to confirm that. I crawled more data from Discogs (very slowly, as Discogs doesn't make it easy with their anti-crawling measure, so I crawled from multiple locations over quite a few hours) and reran scripts that I wrote to crunch the numbers and see how many of the top contributors were responsible for having 80% of the accumulated points in Discogs. When looking at contributions of the top 1000 contributors 60% of the contributors accounted for about 80% of the points. The more data I got the more this moved towards 20% and it became clear very quickly that Discogs indeed seems to follow the 80/20 rule: when looking at the points of the top 36,000 contributors 80% of the points accumulated belong to the top 21.3% contributo

Contributor ranking in Discogs (part 1)

In Discogs you can get points for every contribution. There are a few people who have made it a sport to score as many points as possible, the so called "rank hunters" which some people look down on for reasons that I do not understand (in case you want to be a rankhunter or you want to be a more effective one, please check out the "Unofficial Discogs rankhunting guide" to maximize your efforts). It works like this: adding a new release gets you three points and each edit (regardless how much you edit) gives you one point and there is some sort of leaderboard/ranking for all contributing users to Discogs . There are a few people that have an enormous amount of points and who seem to live for the site. The number one currently has over 356,000 points. When looking at the graphs they really reminded me of power laws , and the 80/20 rule because when looking at the first page of the contributors in Discogs and power law pictures there is a striking resemblance so

ISRC codes in Discogs (part 5)

I am still not done digging into the ISRC data from Discogs, as I see it as a source of errors and I am in ranting mode. At the moment having errors in ISRC codes is not a big problem, as Discogs is not using it yet (right now you cannot specifically search on ISRC codes). Unlocking this data could actually quite handy in the future ("Oh, I like this track that I downloaded and want it on a physical release. On which physical releases was it published?") but I will let them discover that business case themselves. If you don't know what ISRC codes are I suggest you start with reading one of my previous posts about the subject and follow the links there. What I wondered is: how many times can you find the same ISRC code on a single release, for example if a contributor makes a copy/paste error and forgets to change the code? So I adapted my scripts and ran a test to see in which releases (where ISRC codes actually marked as such with a proper ISRC field) there are dup

ISRC codes in Discogs (part 4)

Time to look at ISRC codes again because I am still not done researching them. If you don't know what it is, I would suggest that you first read part 1 , part 2 and part 3 . To better understand some of my complaints below you should also read about why I think that Discogs stores the ISRC fields in the wrong place . So now that we've got that out of the way we can start. The ISRC field in Discogs is in the "Barcodes and Other Identifiers" (BaOI) section, which I already explained is not the correct place and it should be with the individual tracks. Right now people are using descriptions (a free text field) to indicate which ISRC code belongs to which release. Because I know how bad people are with typing in correct information (and of course I am not immune to this) I was wondering in how many releases people make mistakes. I added a very simple check to my scripts to see if descriptions were being reused (copy/paste errors, or "off by one" errors).

What happened in Discogs in March 2018?

A new month, so that means that there is a new dumpfile available that I can do analysis on. If this is the first time you see one of these posts I would highly recommend to first read the posts from previous months . Release statistics The latest dump ("the April dump") covers the period from March 1 - March 31 (inclusive).  The previous dump had 9,554,069 releases, the new dump has  9,680,263 releases. That means 126,194 releases more in the database. 8,962,136 releases stayed the same 589,131 releases were changed 128,996 releases were added 2,802 releases were removed from the database 247 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 0 release was moved from Draft to Accepted Luckily this time Discogs did not change the XML format, so it is relatively easy to compare to the previous month. It looks very similar to previous months: the amount of releases edited is very similar, but perhaps with a sl