Whenever a new datadump from Discogs has become available I try to find out how the data has been improved, and see what else I can find that can be improved in the data. Yesterday I looked at how many releases that previously did not have errors (detected using my scripts) were now flagged as having errors.
The results actually weren't that encouraging. Tto give it a positive twist I wanted to look at the exact opposite: how many releases that were flagged as having an error in the previous dump did not have an error (as detected by my scripts of course) in the new dump?
For this I ignored tracklisting, role and artist errors and focused on the "real" errors (even though tracklisting errors are very real the other two fall more in the "meh" category). Also, if a release with errors was removed, then that is also considered a fix (because the error is no longer there).
In 2169 releases (with 3283 errors) were fixed.
Of course this doesn't mean that these releases are now completely error free: my scripts only catch a small portion of possible errors and mostly focus on formatting and syntactical correctness. Data that is completely wrong and not corresponding to the release could still be error free according to the scripts used to check. So these numbers should be taken with a big lump of salt.
Still, it shows that the number of releases that were fixed is higher than the number of releases in which an error was introduced. That's a reason for some slight optimism.
The results actually weren't that encouraging. Tto give it a positive twist I wanted to look at the exact opposite: how many releases that were flagged as having an error in the previous dump did not have an error (as detected by my scripts of course) in the new dump?
For this I ignored tracklisting, role and artist errors and focused on the "real" errors (even though tracklisting errors are very real the other two fall more in the "meh" category). Also, if a release with errors was removed, then that is also considered a fix (because the error is no longer there).
In 2169 releases (with 3283 errors) were fixed.
Old releases that were fixed in the second half of May 2018 |
Of course this doesn't mean that these releases are now completely error free: my scripts only catch a small portion of possible errors and mostly focus on formatting and syntactical correctness. Data that is completely wrong and not corresponding to the release could still be error free according to the scripts used to check. So these numbers should be taken with a big lump of salt.
Still, it shows that the number of releases that were fixed is higher than the number of releases in which an error was introduced. That's a reason for some slight optimism.
Comments
Post a Comment