In Discogs each release is assigned a number. At the moment (November 27 2017) the number of releases in Discogs is a bit over 9,200,000, while the latest release number is a bit over 11,200,000. That means that 2 million release numbers (around 17.85%) have gone from the database and about 82.15% of releases that were added are still in the database.
In a previous post I wondered if Discogs is getting increasingly "sparse". I kept thinking about it, so I decided to just look at the data to see if my suspicion ("Discogs is getting increasingly sparse") was right.
To me it looks like Discogs is indeed getting increasingly sparse. Unfortunately I need more data to be able to answer why this is.
Around 2 million release numbers have been removed from the Discogs database.[EDIT 2017-11-28: I was told by Discogs that every Draft is also assigned a release number and many drafts don't ever get submitted to the database, so they simply do not appear in the database]
In a previous post I wondered if Discogs is getting increasingly "sparse". I kept thinking about it, so I decided to just look at the data to see if my suspicion ("Discogs is getting increasingly sparse") was right.
Why releases disappear from Discogs
There are several reasons why releases disappear from Discogs. The most common reason is that a release is added as a duplicate of an already existing release and it is merged with an already existing release. Another reason is that some releases are deleted because they are bogus entries, like spam or vandalism. Unfortunately this information is currently not available for mining.Distribution of releases in Discogs
I looked at the distribution of the release numbers, counted how many releases survived in the database for each batch of 1 million release numbers and what percentage that is. The raw numbers:- 1 - 999,999: 795,792 releases (79.58%)
- 1,000,000 - 1,999,999: 867,198 releases (86.72%)
- 2,000,000 - 2,999,999: 907,216 releases (90.72%)
- 3,000,000 - 3,999,999: 896,593 releases (89.66%)
- 4,000,000 - 4,999,999: 837,183 releases (83.72%)
- 5,000,000 - 5,999,999: 811,840 releases (81.18%)
- 6,000,000 - 6,999,999: 806,288 releases (80.63%)
- 7,000,000 - 7,999,999: 804,439 releases (80.44%)
- 8,000,000 - 8,999,999: 799,594 releases (79.96%)
- 9,000,000 - 9,999,999: 752,377 releases (75.24%)
- 10,000,000 - 10,999,999: 763,821 releases (76.38%)
- 11,000,000 - 11,081,999: 65.087 releases (79.38%)
To me it looks like Discogs is indeed getting increasingly sparse. Unfortunately I need more data to be able to answer why this is.
Comments
Post a Comment