A new month, a new data dump...and new statistics! Before reading this post it might be good to read the post about what happened in Discogs in October 2017.
Looking at the XML explains a lot: as it turns out some of the XML related to recording which master ID belongs to has been changed.
While before it was something like:
now it is:
Release statistics
For this blogpost I downloaded the latest datadump (the "December dump") containing data from November 1 to November 30 (inclusive). The previous dump file had 9,107,428 releases, the new dump file has 9,217,123 releases. That means 109,695 more releases in the database.- 3,868,975 releases stayed the same
- 5,235,697 releases were changed
- 112,451 releases were added
- 2,756 releases were removed from the database
- 222 releases had status Draft, Deleted or Rejected
- 11 releases that were not Accepted were in both the November dump and the December dump
- 2 releases were moved from Draft to Accepted
Looking at the XML explains a lot: as it turns out some of the XML related to recording which master ID belongs to has been changed.
While before it was something like:
<master_id>29591</master_id> <data_quality>Needs Vote</data_quality>
now it is:
<master_id is_main_release="false">Needs Vote</master_id>which defeats the purpose of my SHA256 based comparison method and I cannot make a good comparison between what happened in October and November and I need to go back to the drawing board, and add more data. Likely I will be making some conversion scripts to change between the two formats. After that I will be saying something more about the smells of the data that was added in November.
Comments
Post a Comment