For people caring about correctness of data in Discogs, it sometimes seems like an uphill battle. Once an error has been introduced, it gets copied and spreads, and fixing it becomes almost impossible. It very much resembles fixing errors in the waterfall model of software engineering: stopping errors at the beginning is much easier than fixing later.
One way errors are spreading is because of the "copy to draft" functionality in Discogs, where information from an existing release can be copied and serve as a template when adding a new variant of a release. Although extremely useful (it speeds up entering information) people not only copy the correct information, but also the errors, or they leave information that is irrelevant to a release and don't remove it (example: SID codes for vinyl or other releases). When the new release in turn is used as a template the wrong information spreads further through the database.
Detecting errors is quite trivial using some simple checks that verify the contents of the data in a release and report if something is wrong.
Fixing errors is another story. People using scripts (like myself) already have a strong motivation to find and fix these errors, but the vast amount of errors that are currently in the data (my scripts already detect close to 400,000 individual errors without trying hard) makes it an impossible task to do with just a few users, no matter how eager they are.
So you need to try to motivate other users to fix errors as well. In this blogpost I am going to explore a solution that I think could be implemented in a fairly non-intrusive way, and very lightweight way, perhaps in some sort of "janitor mode" for users that want to help fixing.
As said, detecting the error is quite trivial, but getting people to fix the error is more difficult The first step is that the user has to recognize the error. Because there are so many different guidelines and rules I can totally understand that some users (including myself) take a conservative approach when seeing something that doesn't look quite right and think "I'll not touch it, because I am unsure about it" and leave it as is. Or perhaps they do not want to touch other people's submissions (also, some people are quite possessive, which simply does not make sense for a collaborative site), or think that because it is already in the database it is the correct way. Whatever the reason for the error, making certain that the user actually recognizes it as an error is a very important first step in fixing it.
Most humans are very good picking up visual clues, so adding something to the site indicating that there might be something wrong could be very effective. It could possibly look something like the picture below. I must admit that my graphical skills are very bad, so this can use a lot of improvement:
The possible error that is on the site now has a red rectangle around it, with some text indicating that likely the value is wrong (text should most probably go somewhere else, or be shown in another way, and very likely also containing some suggestion like "This is likely a Rights Society" instead of the terse message that I wrote).
Probably there is a lot more that could be done to prevent errors sneaking into the data. I might talk about that in a future post: I have a few vague ideas and they still need some more time.
One way errors are spreading is because of the "copy to draft" functionality in Discogs, where information from an existing release can be copied and serve as a template when adding a new variant of a release. Although extremely useful (it speeds up entering information) people not only copy the correct information, but also the errors, or they leave information that is irrelevant to a release and don't remove it (example: SID codes for vinyl or other releases). When the new release in turn is used as a template the wrong information spreads further through the database.
Detecting errors is quite trivial using some simple checks that verify the contents of the data in a release and report if something is wrong.
Fixing errors is another story. People using scripts (like myself) already have a strong motivation to find and fix these errors, but the vast amount of errors that are currently in the data (my scripts already detect close to 400,000 individual errors without trying hard) makes it an impossible task to do with just a few users, no matter how eager they are.
So you need to try to motivate other users to fix errors as well. In this blogpost I am going to explore a solution that I think could be implemented in a fairly non-intrusive way, and very lightweight way, perhaps in some sort of "janitor mode" for users that want to help fixing.
Helping users find errors more easily
Guiding users to fix errors is key. Let's look at an error that is quite common in the "Barcode and Other Identifiers" (BaOI) section, namely using wrong information in the Barcode field. The reason there are many errors in this field is that it is the default value when adding a new identifier to this section and quite a few people simply do not change the default (even though it is minimal effort). The below screenshot shows what that looks like when visiting a release page on Discogs:Common error: the default value Barcode has not been changed to Rights Society |
As said, detecting the error is quite trivial, but getting people to fix the error is more difficult The first step is that the user has to recognize the error. Because there are so many different guidelines and rules I can totally understand that some users (including myself) take a conservative approach when seeing something that doesn't look quite right and think "I'll not touch it, because I am unsure about it" and leave it as is. Or perhaps they do not want to touch other people's submissions (also, some people are quite possessive, which simply does not make sense for a collaborative site), or think that because it is already in the database it is the correct way. Whatever the reason for the error, making certain that the user actually recognizes it as an error is a very important first step in fixing it.
Most humans are very good picking up visual clues, so adding something to the site indicating that there might be something wrong could be very effective. It could possibly look something like the picture below. I must admit that my graphical skills are very bad, so this can use a lot of improvement:
Mock up to indicate that there is possibly something wrong with the data |
Possible implementation
Personally I would likely use a script on the server to generate small bits of JSON (describing errors) for every release that has errors, and use some extra client side code with CSS to download the JSON and indicate on the page where the users are either when people have some sort of "janitor mode" enabled, or are editing releases. The JSON I would either generate on the fly or every night, and invalidate every time the release is done and then regenerate (either on the fly, or the next night).Probably there is a lot more that could be done to prevent errors sneaking into the data. I might talk about that in a future post: I have a few vague ideas and they still need some more time.
Comments
Post a Comment