Skip to main content

Contributor ranking in Discogs (part 1)

In Discogs you can get points for every contribution. There are a few people who have made it a sport to score as many points as possible, the so called "rank hunters" which some people look down on for reasons that I do not understand (in case you want to be a rankhunter or you want to be a more effective one, please check out the "Unofficial Discogs rankhunting guide" to maximize your efforts).

It works like this: adding a new release gets you three points and each edit (regardless how much you edit) gives you one point and there is some sort of leaderboard/ranking for all contributing users to Discogs. There are a few people that have an enormous amount of points and who seem to live for the site. The number one currently has over 356,000 points.

When looking at the graphs they really reminded me of power laws, and the 80/20 rule because when looking at the first page of the contributors in Discogs and power law pictures there is a striking resemblance so I thought it was time for some unscientific statistics.

I grabbed the rank data of the top 1000 contributors in Discogs by scraping, as Discogs doesn't make this information available easily. Because Discogs has anti-scraping measures in place I probably didn't get all the correct information and missed a few contributions here and there.

For each user I extracted the points for each user and computed the percentage of the total amount of points for the user. I also computed it for every 50 contributors (like Discogs does on each page) and also computed the cumulative percentage of the top X (in batches of 50).

The numbers for the top 10 of contributors (percentages only):
  1. 1.12%
  2. 0.93%
  3. 0.76%
  4. 0.68%
  5. 0.51%
  6. 0.50%
  7. 0.45%
  8. 0.44%
  9. 0.44%
  10. 0.41%
which is not very interesting, except showing that the number one has almost 3 times as many contributions as the number ten.

It gets much more interesting when looking at the numbers per batch of 50:
  • 1-50: 18.22%
  • 51-100: 10.53%
  • 101-150: 8.49%
  • 151-200: 7.23%
  • 201-250: 6.12%
  • 251-300: 5.34%
  • 301-350: 4.82%
  • 351-400: 4.42%
  • 401-450: 4.08%
  • 451-500: 3.77%
  • 501-550: 3.52%
  • 551-600: 3.28%
  • 601-650: 3.08%
  • 651-700: 2.85%
  • 701-750: 2.67%
  • 751-800: 2.54%
  • 801-850: 2.42%
  • 851-900: 2.30%
  • 901-950: 2.21%
  • 951-1000: 2.11%
As can be seen the contributions very quickly taper off and then seem to get less at a much lower pace (the start of the "long tail"?). Looking at the accumulated percentages is more interesting:
  • 1-50: 18.22%
  • 51-100: 28.74%
  • 101-150: 37.23%
  • 151-200: 44.46%
  • 201-250: 50.59%
  • 251-300: 55.93%
  • 301-350: 60.75%
  • 351-400: 65.17%
  • 401-450: 69.25%
  • 451-500: 73.02%
  • 501-550: 76.54%
  • 551-600: 79.82%
  • 601-650: 82.90%
  • 651-700: 85.75%
  • 701-750: 88.42%
  • 751-800: 90.95%
  • 801-850: 93.38%
  • 851-900: 95.68%
  • 901-950: 97.89%
  • 951-1000: 100.00%
The top 25% contributors account for slightly more than half of the points and the top 60% for almost 80% of the points.

That is not really the 80/20 distribution but I only looked at the top 1000 contributors out of more than 406,000. To be more correct I should actually look at the points of the top 80,000 contributors but as said Discogs is not making that easy for me to access that information. In the next few days I will try to crawl more information and come back with hopefully more interesting results.

Comments

Popular posts from this blog

SID codes (part 1)

One thing that I only learned about after using Discogs is the so called Source Identification Code, or SID. These codes were introduced in 1994 to combat piracy and to find out on which machines a CD was made. It was introduced by Philips and adopted by IFPI, and specifications are publicly available which clearly describe the two available SID codes (mastering SID code and mould SID code). Since quite a few months Discogs has two fields available in the " Barcode and Other Identifiers " (BaOI) section: Mould SID code Mastering SID code A few questions immediately popped up in my mind: how many releases don't have a SID field defined when there should be (for example, the free text field indicates it is a SID field)? how many releases have a SID field with values that should not be in the SID field? how many release have a SID field, but a wrong year (as SID codes were only introduced in 1994) how many vinyl releases have a SID code defined (which is impossi

SPARS codes (part 1)

Let's talk about SPARS codes used on CDs (or CD-like formats). You have most likely seen it used, but maybe don't know its name. The SPARS code is a three letter code indicating if recording, mixing and mastering were analogue or digital. For example they could look like the ones below. There is not a fixed format, so there are other variants as well. Personally I am not paying too much attention to these codes (I simply do not care), but in the classical music world if something was labeled as DDD (so everything digital) companies could ask premium prices. That makes it interesting information to mine and unlock, which is something that Discogs does not allow people to do when searching (yet!) even though it could be a helpful filter. I wanted to see if it can be used as an identifier to tell releases apart (are there similar releases where the only difference is the SPARS code?). SPARS code in Discogs Since a few months SPARS is a separate field in the Discogs

Country statistics (part 2)

One thing I wondered about: for how many releases is the country field changed? I looked at the two most recent data dumps (covering February and March 2019) and see where they differed. In total 5274 releases "moved". The top 20 moves are: unknown -> US: 454 Germany -> Europe: 319 UK & Europe -> Europe: 217 unknown -> UK: 178 UK -> Europe: 149 Netherlands -> Europe: 147 unknown -> Europe: 139 unknown -> Germany: 120 UK -> US: 118 Europe -> Germany: 84 US -> UK: 79 USA & Canada -> US: 76 US -> Canada: 65 unknown -> France: 64 UK -> UK & Europe: 62 UK & Europe -> UK: 51 France -> Europe: 51 Saudi Arabia -> United Arab Emirates: 49 US -> Europe: 46 unknown -> Japan: 45 When you think about it these all make sense (there was a big consolidation in Europe in the 1980s and releases for multiple countries were made in a single pressing plant) but there are also a few weird changes: