What I like about working with the Discogs data is to make data that isn't visible visible. In an earlier post I talked about that I suspected that the Discogs contributor ranking likely followed the 80/20 rule , but I didn't have enough data yet to confirm that. I crawled more data from Discogs (very slowly, as Discogs doesn't make it easy with their anti-crawling measure, so I crawled from multiple locations over quite a few hours) and reran scripts that I wrote to crunch the numbers and see how many of the top contributors were responsible for having 80% of the accumulated points in Discogs. When looking at contributions of the top 1000 contributors 60% of the contributors accounted for about 80% of the points. The more data I got the more this moved towards 20% and it became clear very quickly that Discogs indeed seems to follow the 80/20 rule: when looking at the points of the top 36,000 contributors 80% of the points accumulated belong to the top 21.3% contributo...
A blog dedicated to two of my hobbies: vinyl records and digital data and exploring where the two intersect. This blog is not affiliated with Discogs, but uses a lot of its data. On Discogs you can find me as metalmijn. I also get a lot of help from a friend you can find on Discogs as gerjolp Check out my Discogs cleanup scripts at: https://github.com/armijnhemel/cleanup-for-discogs/