Skip to main content

Posts

Showing posts from October, 2017

Cleanup efforts are already paying off

Every year Discogs has their September Pledge INitiative (or S.P.IN) contest to have more data added to the database. A friend (gerjolp on Discogs) won a swag pack (bag, T-shirt, pack of 2 slipmats and some stickers) because of the scripts that I wrote: he fixed countless entries in a short period of time and is continuing to do so. Luckily for me he was so kind to split his prize with me, so I also feel a bit like a winner.

How Discogs can prevent wrong data (part 1)

I just spent half an hour listening to a friend who has been cleaning up entries where a user entered a rights society, but used the wrong field. What he often saw is an Amazon identifier ( ASIN ), even though it was very clearly not (Amazon identifiers are well defined ). Of course, what happened here is that (before the last big BaOI update) the ASIN and Rights Society choices in the BaOI drop down box were directly next to eachother and people simply chose the wrong one. Some of these mistakes were in Discogs for several years. These errors would have been very easy to detect and prevent. Errors that have been in Discogs for years could have been very easily prevented. A few possible checks that could have been implemented: check if the value of an ASIN field matches the ASIN specification check if the value of a field (if not Rights Society ) happens to be a known rights society, or even a subset of righst societies (like BIEM, with or without dots) This is not impossib

Digging into the Spanish Depósito Legal identifier (part 1)

Every country has its laws which determines what has to be printed on releases. One of the countries where this is most visible is Spain. Especially on older releases you can find one identifier that is of great help to date releases, namely the depósito legal identifier. In Spain it is normal (or perhaps even mandatory) to deposit music releases with a library, similar to the Library of Congress in the US, or other deposit libraries around the world. Each release (not just records, but also books) is assigned an identifier that is printed on it, for example in the picture below: If you want to you can go to a library and search for the item and perhaps listen to it. For example. Queen's Back Chat 7" can be found at the national library in Madrid in Sala Barbieri . It is unfortunately not easy to unlock this information from the website of the national library in Spain (so they still have quite some work to do) and it doesn't seem you can search on the depósito

SPARS codes (part 1)

Let's talk about SPARS codes used on CDs (or CD-like formats). You have most likely seen it used, but maybe don't know its name. The SPARS code is a three letter code indicating if recording, mixing and mastering were analogue or digital. For example they could look like the ones below. There is not a fixed format, so there are other variants as well. Personally I am not paying too much attention to these codes (I simply do not care), but in the classical music world if something was labeled as DDD (so everything digital) companies could ask premium prices. That makes it interesting information to mine and unlock, which is something that Discogs does not allow people to do when searching (yet!) even though it could be a helpful filter. I wanted to see if it can be used as an identifier to tell releases apart (are there similar releases where the only difference is the SPARS code?). SPARS code in Discogs Since a few months SPARS is a separate field in the Discogs

Label Code (part 2)

One thing that I wondered about is: how many people can get the label code wrong? The label code is very simple: the letters LC (uppercase or lowercase), possibly followed by whitespace, a colon, a dash, etc. and then 4 to 6 digits. Label codes with 6 digits are still extremely rare but will start to pop up as soon as there are no more 5 digit label codes (the official website is at just over 78,000 now). Many of the label codes are prefixed with zeroes to fill the number to 4 or 5 digits, for example: LC00123. Since 0 (zero) is easy to confuse with O (capital o) I wondered how many entries that likely are a valid label code use O (capital o) instead of 0 (zero). I took the October 2017 dump and searched for Label Code fields where the value started with LC or lc , and contained O (capital o). I found 131 unique releases with in total 133 Label Code fields, which is actually surprisingly low. Of these a few were also: non-compliant label codes, for example this release (rev

Observations about the Discogs marketplace

One feature that Discogs has is that it allows people to sell copies of items that are listed in the catalog. In theory this works very well: people describe an item in the catalog, a seller then picks the right item from the catalog (possibly first adding it to the catalog), describes anything that is special about the particular copy, sells the item and gives and receives feedback about the transaction and all is well. Except in practice this is not what happens. Instead what I see is that sellers just pick some items from the marketplace, then describe the differences with the listed item instead of first adding the right item to the database (which is against the terms of the marketplace ) and let the user figure it out. As I said before: I can understand why some sellers are not adding items that are not in the database, as it is a lot of work (and it would help a lot if it could be made easier), but for buyers it can be very frustrating to not get the item they thought they wou

What happened in Discogs in September 2017?

The new monthly datadump is available, so I downloaded it and processed it with my scripts to see what happened in September 2017. Release statistics The new dumpfile was published on October 4 2017 and has 8,996,419 releases. The previous dump (published September 4 2017) had 8,878,391 releases. That means 118,028 more releases in the database. 3,158 releases were removed in the new dumpfile 121,186 releases were added in September 8,456,324 releases remained the same 418,909 releases were changed 205 releases had the status Draft , Deleted or Rejected set 11 releases that were not Accepted were present in both the September 2017 and October 2017 data dump 1 release moved from Draft to Accepted In total there were edits for 540,095 releases (changed, plus new). There were 43,200 minutes in September 2017, meaning that there was a minimum of 12.5 edits (new or changed releases) per minute, meaning a bit more than one edit every five seconds. There were also edits for t

Repairing audio equipment (part 1)

Having a lot of data about music releases is cool, but we should not forget that in the end the most important thing is to actually listen to the music and to enjoy it. That means that you need to have decent equipment to listen to. And, like everything else, at some point it starts to break down. It is easy to just throw out the old broken stuff and to buy new equipment, but it might be worth looking at repairing instead of buying new. This year I repaired an amplifier and a cassette recorder. For me this wasn't easy: I might be good with software, but debugging physical objects is not my forte (my former boss went as far as describing me as someone with two left hands, and ten thumbs). Nevertheless I decided to try as there was nothing to lose: the equipment was already no longer working and we had started to look at replacements (but not decided yet), so I wanted to give it a shot to see if I could give these devices a second life. Sony CFS-W303L The cassette recorder I rep