Skip to main content

What happened in Discogs in November 2017? (part 1)

A new month, a new data dump...and new statistics! Before reading this post it might be good to read the post about what happened in Discogs in October 2017.

Release statistics

For this blogpost I downloaded the latest datadump (the "December dump") containing data from November 1 to November 30 (inclusive). The previous dump file had 9,107,428 releases, the new dump file has 9,217,123 releases. That means 109,695 more releases in the database.

  • 3,868,975 releases stayed the same
  • 5,235,697 releases were changed
  • 112,451 releases were added
  • 2,756 releases were removed from the database
  • 222 releases had status Draft, Deleted or Rejected
  • 11 releases that were not Accepted were in both the November dump and the December dump 
  • 2 releases were moved from Draft to Accepted
What immediately stands out is that there is an enormous amount of releases that has been changed compared to previous months, when it was about 10 times less. As I don't believe that Discogs users have become ten times more active changing/fixing releases there has to be another reason why this number is so high.

Looking at the XML explains a lot: as it turns out some of the XML related to recording which master ID belongs to has been changed.

While before it was something like:

<master_id>29591</master_id>
<data_quality>Needs Vote</data_quality>

now it is:

<master_id is_main_release="false">Needs Vote</master_id>

which defeats the purpose of my SHA256 based comparison method and I cannot make a good comparison between what happened in October and November and I need to go back to the drawing board, and add more data. Likely I will be making some conversion scripts to change between the two formats. After that I will be saying something more about the smells of the data that was added in November.

Comments