Time to dive further into the barcode information that is stored in Discogs. As I said in part 1: I didn't look much into this subject before, because of the potential huge number of errors I would uncover. After digging a bit further into the data I can confirm that there are indeed many releases with errors. But on the way I found a few interesting things that I want to share.
Another variation I saw is where the EAN-13 is prefixed with a 'T' (for 'text'?) as can be seen on this ABC release. More than 7800 releases with this pattern were found. That's a bit more than expected.
Yet another variation is where the EAN-13 has 'M' appended, like on this release. There were about 500 releases where I could see this pattern. Another 68 have 'P', like this release.
After filtering (and before checking if these barcodes are actually valid) I still have around 60,000 entries that deserve a closer look. At first sight it is mostly:
In the next part I am going to dig deeper into the releases that I think have a valid barcode field.
People can't enter data properly
It is surprising to see how many people added a barcode in the Barcode field and then added a '.' that is not part of the barcode and that even cannot be found in the picture: around 150. I have not even counted things like trailing spaces, soft hyphens (whyyyyy?), and so on.Text representations of barcodes are not consistent
I looked into some of the descriptions of several barcodes, but they don't seem to describe the barcodes that are in the wild. For example, some EAN-13 barcodes have a '.' in them between the first 12 digits and the check digit, which can be seen on this release (AA side). In total 72 releases with this pattern were found.Another variation I saw is where the EAN-13 is prefixed with a 'T' (for 'text'?) as can be seen on this ABC release. More than 7800 releases with this pattern were found. That's a bit more than expected.
Yet another variation is where the EAN-13 has 'M' appended, like on this release. There were about 500 releases where I could see this pattern. Another 68 have 'P', like this release.
People enter crap data
There are 591 fields where the barcode field contained a '.' or one or multiple '-' characters, or just spaces. I have no idea why they did this. The most likely explanation is to indicate there is no barcode (for which Discogs really should introduce a better system) or sloppiness.After filtering (and before checking if these barcodes are actually valid) I still have around 60,000 entries that deserve a closer look. At first sight it is mostly:
- rights societies
- SID codes
- matrix/runout
In the next part I am going to dig deeper into the releases that I think have a valid barcode field.
Comments
Post a Comment