Skip to main content

Posts

Showing posts from June, 2018

How to "hijack" releases in Discogs that were voted correct without getting caught

There are some things in Discogs that really irritate me. One of the things that I personally dislike a lot is the voting system, as it is not granular enough. But what's worse is that there are ways to (partially) edit release and where the voting system can be bypassed: pictures. Normally when editing a release in Discogs that has received a vote (either correct, incorrect, etc.) its status is set back to "needs vote", except when only changing images. In that case the voting status stays as it is. I have seen instances where an entry was voted "complete and correct" 7 years ago, but that pictures that were completely different replaced the original ones and no one noticed. This so called "hijacking" of releases is very much frowned upon and unfortunately it happens far too often. So what Discogs should do is quite simple: make sure that every edit, including an edit related to images, causes the status to be set to "needs vote".

How are release formats distributed over Discogs?

In Discogs each release has one or more Format fields, in which a contributor has to indicate what format a device has, or what formats a device has, in case it has multiple formats. I simply looked at all the releases in the database and simply counted, and this is the list I got (from most releases to fewest releases): Vinyl: 4,819,258 CD: 2,913,785 File: 917,569 Cassette: 647,604 CDr: 386,541 Shellac: 146,494 DVD: 85,475 Box Set: 52,634 All Media: 29,217 Flexi-disc: 20,814 VHS: 18,674 8-Track Cartridge: 15,206 Acetate: 9,580 DVDr: 8,581 Lathe Cut: 8,023 SACD: 6,380 Reel-To-Reel: 5,022 Blu-ray: 3,705 Laserdisc: 3,077 Memory Stick: 1,701 Minidisc: 1,613 Edison Disc: 1,308 Cylinder: 1,290 Betacam SP: 1,060 Hybrid: 1,012 Floppy Disk: 1,003 Blu-ray-R: 674 CDV: 601 4-Track Cartridge: 593 DCC: 397 PathƩ Disc: 367 U-matic: 362 Betamax: 263 DAT: 209 PlayTape: 144 Microcassette: 135 HD DVD: 65 MiniDV: 57 UMD: 51 VHD: 40 SelectaVision: 37 Tefifon: 3

Digital file releases in Discogs (part 1)

One category of releaes in Discogs are the "digital releases". Basically: MP3s or other digital formats from stores, iTunes releases, and so on. Call me old fashioned, but personally I don't see these as collectables, as to me they are just files on a computer or music player. But apparently many people disagree with that and collect them. In Discogs the digital releases have "File" in the format field. This makes it quite easy to recognize. So I wondered: how many of these "file" releases are there and where are they in the data? Is it mostly the newer releases, or are there also many older file releases? So I looked at everything in the latest data dump and found 917,569 releases that have "File" in the format field, which is about 10% of the releases in Discogs and 40% more than for example cassettes. Distribution of releases tagged as "File" in the Discogs data set So only in the very early days there were few "

Looking back on 9 months of digging into Discogs

I have been digging into the Discogs dataset for over 9 months now and blogging about it since September. During this period I have made a few observations. In short I must say that I have mixed feelings about Discogs because it isn't clear what Discogs actually is and what they want to achieve. I talked about this earlier , but it basically comes down to this: Is it a catalog? Is it a marketplace? Is it a place to organize your collection? Discogs is trying to be all but not succeeding because there are tensions between the different use cases: enforcing correctness chases sellers away from the marketplace and selling records is what is bringing in the money for Discogs and which keeps the other fires burning. But the costs for this is that the data in the catalog is sometimes blatantly incorrect, which vastly reduces its value. Personally I don't care about the marketplace, as I am a collector and I care about the catalog. Discogs has enormous potential and for collecto

How many known errors in Discogs were fixed in May 2018?

Whenever a new datadump from Discogs has become available I try to find out how the data has been improved, and see what else I can find that can be improved in the data. Yesterday I looked at how many releases that previously did not have errors (detected using my scripts ) were now flagged as having errors . The results actually weren't that encouraging. Tto give it a positive twist I wanted to look at the exact opposite: how many releases that were flagged as having an error in the previous dump did not have an error (as detected by my scripts of course) in the new dump? For this I ignored tracklisting, role and artist errors and focused on the "real" errors (even though tracklisting errors are very real the other two fall more in the "meh" category). Also, if a release with errors was removed, then that is also considered a fix (because the error is no longer there). In 2169 releases (with 3283 errors) were fixed. Old releases that were fixed in

Introducing new errors in old releases

One thing that I noticed when looking at older releases is that sometimes new errors are introduced. These are then picked up by my scripts which detect them just fine, but I was wondering how often this happens, as that is something my scripts do not detect. So I grabbed the dumps released in May and June and simply counted how many old releases were in the later dump, but not in the earlier. As it turned out, quite a few: 1692 errors in 1046 unique releases and there was only about two weeks in between the two dump files. Extrapoliting a bit that means probably for around 3000 errors in 2000 older releases each month new errors are introduced (and most of these errors are preventable). When ignoring Artist and Tracklisting errors there are still about 630 errors left in 474 releases. Old releases in which errors were introduced in the second half of May 2018 What is clear is that this time it is mostly the recent releases that are adapted. What this also means is th

What happened in Discogs in May 2018?

Another month, so another round of statistics. This time Discogs was fairly quick to release a new data dump, unlike last month . If you don't know how this works it is best to first read that overview, plus perhaps a few older ones. Release statistics The latest dump (which I call "the June dump", as it was released then) covers all data added from May 1 May 15 - May 31 2018 (inclusive). The previous dump had 9,843,513 releases, the new dump has 9,906,032 releases. That means 62,519 releases more in the database. Of those: 9,520,889 releases stayed the same 320,855 releases were changed 64,288 releases were added 1,769 releases were removed from the database 365 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 1 release was moved from Draft to Accepted What strikes me is that the amount of releases is a lot less than last time, although then it was a lot more. So I guess that last month's dump was actua

What happened in Discogs in April 2018?

I actually had wanted to write this post a few weeks ago, but for some reasons it took Discogs a lot longer to publish the new data dump. I can only guess why, but I would not be surprised if GDPR had something to do with it. I then got swamped with other tasks so now, almost a month too late, I can finally tell you what happened in Discogs in April 2018. First of all, if this is the first time you read one of these posts, please read the one from last month first. Release statistics The latest dump (which I call the "May dump") covers the period of April 1 - April 30 May 14 2018 (inclusive). The previous dump had 9,680,263 releases, the new dump has 9,843,513 releases.  That means 163,250 releases more in the database. Of these: 8,983,224 releases stayed the same 693,583 releases were changed 166,706 releases were added 3,456 releases were removed from the database 162 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both