I was gone from Discogs for a while due to lack of time but also because I was a bit frustrated about the lack of quality of data in Discogs and the (seeming) unwillingness of Discogs to fix this (although it is also very likely that, like me, they are completely swamped and there are only so many hours in the day).
One thing that keeps bugging me in Discogs is that people add or change information in such a way that they actually contain wrong information. I already looked at this problem once before and I was wondering if it is as bad as it was back then, or if things have gotten better in Discogs. A few unscientific tests didn't seem very promising so I grabbed the dumps of February and March 2019 and simply counted how many old releases were in the later dump, but not in the earlier. Ideally the errors found in the older releases are known errors, so for old releases this number should be 0 in the best case. But, that is not what is happening.
As it turned out, quite a few new errors were introduced: 1108 errors in 811 unique releases. When looking at how these releases are distributed across the dataset it looks like this (and yes, this graph looks different from the one in the previous article I wrote about this subject, as I excluded the new releases):
Sigh.
What I said before: most of these errors are entirely preventable and as long as Discogs makes it easy to introduce errors people will keep making mistakes. The only fix, as I have argued before, is that Discogs makes it more difficult for people to make those errors in the first place.
One thing that keeps bugging me in Discogs is that people add or change information in such a way that they actually contain wrong information. I already looked at this problem once before and I was wondering if it is as bad as it was back then, or if things have gotten better in Discogs. A few unscientific tests didn't seem very promising so I grabbed the dumps of February and March 2019 and simply counted how many old releases were in the later dump, but not in the earlier. Ideally the errors found in the older releases are known errors, so for old releases this number should be 0 in the best case. But, that is not what is happening.
As it turned out, quite a few new errors were introduced: 1108 errors in 811 unique releases. When looking at how these releases are distributed across the dataset it looks like this (and yes, this graph looks different from the one in the previous article I wrote about this subject, as I excluded the new releases):
New errors introduced in old releases in the Discogs database in March |
Sigh.
What I said before: most of these errors are entirely preventable and as long as Discogs makes it easy to introduce errors people will keep making mistakes. The only fix, as I have argued before, is that Discogs makes it more difficult for people to make those errors in the first place.
Comments
Post a Comment