I have been digging into the Discogs dataset for over 9 months now and blogging about it since September. During this period I have made a few observations.
In short I must say that I have mixed feelings about Discogs because it isn't clear what Discogs actually is and what they want to achieve. I talked about this earlier, but it basically comes down to this: Is it a catalog? Is it a marketplace? Is it a place to organize your collection?
Discogs is trying to be all but not succeeding because there are tensions between the different use cases: enforcing correctness chases sellers away from the marketplace and selling records is what is bringing in the money for Discogs and which keeps the other fires burning. But the costs for this is that the data in the catalog is sometimes blatantly incorrect, which vastly reduces its value.
Personally I don't care about the marketplace, as I am a collector and I care about the catalog. Discogs has enormous potential and for collectors with a bit of an obsession with different record pressings (like myself) a correct catalog would be a really cool thing to have and use.
But at the moment Discogs is a very very long way from that. And, to be honest, I am not sure if it will ever get there. Let me tell you abou why I think so.
Fixing this would be an enormous operation. At the moment there are close to ten million releases in the Discogs database, with a lot of associated metadata and releases being added all the time. They cannot just redesign the datamodel, migrate the database to the new structure and change users' workflow, without major disruptions to operation of the website. So unfortunately I fear that this is something that we have to live with.
Of course in the earlier days these releases were not made with collectors in mind, but it was just a mass product. That meant that if a product was successful new batches were made, where errors were corrected, or designs were updated, or different labels were used just because they had a pile of them, and so on. A lot of the knowledge especially from the early days, has been lost and now we just have to guess.
For example, it would be fairly trivial to have a few basic checks (which are already in place for some pieces of data) that would flag the most obvious mistakes.
But Discogs is asking for it, by basically dumping people into "expert mode" straightaway. I think that a guided submission system with hints and suggestions (think: a wizard) would give higher quality results.
And, to be honest, you would be right. Quite often I actually am not bothering at all and Discogs, or errors in Discogs data, are very very away from my mind (and increasingly so).
Still I think it is a really great dataset to dig into. I have learned a lot about music releases in general and been able to explore a few new technologies that I could also reuse for other purposes, so it has been a good playing ground.
In short I must say that I have mixed feelings about Discogs because it isn't clear what Discogs actually is and what they want to achieve. I talked about this earlier, but it basically comes down to this: Is it a catalog? Is it a marketplace? Is it a place to organize your collection?
Discogs is trying to be all but not succeeding because there are tensions between the different use cases: enforcing correctness chases sellers away from the marketplace and selling records is what is bringing in the money for Discogs and which keeps the other fires burning. But the costs for this is that the data in the catalog is sometimes blatantly incorrect, which vastly reduces its value.
Personally I don't care about the marketplace, as I am a collector and I care about the catalog. Discogs has enormous potential and for collectors with a bit of an obsession with different record pressings (like myself) a correct catalog would be a really cool thing to have and use.
But at the moment Discogs is a very very long way from that. And, to be honest, I am not sure if it will ever get there. Let me tell you abou why I think so.
Discogs' data model is incorrect and hard to fix
Discogs is far from perfect. It is very clear that the data model has grown organically and you can almost see which design choices were made to accomodate new uses cases (or at least, I think I can).Fixing this would be an enormous operation. At the moment there are close to ten million releases in the Discogs database, with a lot of associated metadata and releases being added all the time. They cannot just redesign the datamodel, migrate the database to the new structure and change users' workflow, without major disruptions to operation of the website. So unfortunately I fear that this is something that we have to live with.
Some people don't appreciate help
Some people have responded very negatively to changes to their releases. There is not a single reason: some people are very possessive of their data ("this is my release"), others people simply do not seem to understand that the pages that they see are just a representation of data, and they focus on the layout of the data as if it were in print. Others object to changes and point to old guidelines that were in place when the release was edited and insist that the release should be kept as it was, or judged according to the guidelines that are in place when the release was added, which to me makes even less sense.Some people don't want to take ownership of problems
What I noticed is that some people point out problems and describe them in full detail (with references to forum discussions, and so on) and then keep pinging until someone (one of the contributors, or someone else) fixes the problems. To me this is just absurd: in a collaborative system you simply fix that if you already have all the necessary information, instead of waiting and pinging for (in some instances) years. It is truly mind boggling.Discogs' voting system is too coarse and harsh
Discogs allows people with voting rights to vote on the correctness of the releases. In some cases the commits are then automatically reverted. Voting is done for the entire release. I have seen it happen that someone corrected a lot of the information, but not all of it (due to lack of information, and so on) and then someone voted "Entirely Incorrect", because of some other information, that was already present, was not corrected and the release was rolled back to its previous state, even if it was in fact more incorrect.Cataloguing vinyl releases is hard...
There are so many tiny differences in releases and the way they were pressed that it is almost as if they did it on purpose to confuse collectors: "if we just change this tiny bit of text here, then in 30 years people will be really confused!" or something like that.Of course in the earlier days these releases were not made with collectors in mind, but it was just a mass product. That meant that if a product was successful new batches were made, where errors were corrected, or designs were updated, or different labels were used just because they had a pile of them, and so on. A lot of the knowledge especially from the early days, has been lost and now we just have to guess.
...cataloguing CDs is even harder...
I always thought that distinguishing vinyl records from eachother was hard and that with CDs it actually would be easier, as there are fewer pressing plants. But no, this is actually not the case at all. There are so many represses that can only be distinguished from eachother by very accurately looking at the releases and being very precise: pressing plant names, slightly different matrix numbers, all kinds of codes (SID codes), and sometimes no clarity or agreement what country a release is from, and so on, make it very difficult....and the way Discogs works is not exactly helping.
What really pisses me off at times is that Discogs could do a lot more to prevent incorrect releases popping up on Discogs and basically expect the Discogs comunity to clear up the mess that others have made (which could be seen as a tragedy of the commons).For example, it would be fairly trivial to have a few basic checks (which are already in place for some pieces of data) that would flag the most obvious mistakes.
But Discogs is asking for it, by basically dumping people into "expert mode" straightaway. I think that a guided submission system with hints and suggestions (think: a wizard) would give higher quality results.
So why am I bothering?
After all the negativity above you might wonder why I am actually bothering: I could just leave it and focus on something else that is more rewarding or useful.And, to be honest, you would be right. Quite often I actually am not bothering at all and Discogs, or errors in Discogs data, are very very away from my mind (and increasingly so).
Still I think it is a really great dataset to dig into. I have learned a lot about music releases in general and been able to explore a few new technologies that I could also reuse for other purposes, so it has been a good playing ground.
Comments
Post a Comment