Skip to main content

Posts

Showing posts from December, 2017

Releases with incorrect tracklistings on Discosg (part 3)

Another thing that I wondered about when looking at tracklistings: in how many of those tracklistings have people duplicated numbers or letters? I saw at least one, but there have to be many more. Then again I thought that this is probably something that I should not be wanting to wonder about based on my earlier experiences with errors found in tracklistings . Nevertheless, I pushed ahead and what I feared: there are lots of releases where in tracklist positions are duplicated. I found 101,474 instances in 31,056 releases. Releases with possible duplication of tracklist positions While most of these seem to be actual errors there are also a few exceptions, which are not always easy to identify automatically: some vinyl releases (like singles) are a double A side and will have 'A' on both sides of the label some releases don't have A/B sides, but instead have "This/That" or use something else some promotional releases have the same song on both side

Finding spelling errors in Czechoslovak and Czech releases on Discogs

Time for a short, but positive post! Today I got validation that what I do is actually useful for people working with Discogs. While sharing the findings of the previous article about Czechoslovakian manufacturing date codes we were asked if we could also search for a particular misspelling in Czech releases that are impossible to find with the Discogs search functionality. Apparently the Czech alphabet has a character (ě) that looks a lot like another character (ĕ) and it is difficult for non-Czechs to spot the difference (it took me some time as well), so there was the suspicion that there would be releases where one was used instead the other, but Discogs does not allow you to search for these characters (according to one user on the Czechoslovak forum). Adding another check to my scripts was fairly trivial (I only had to take care to not search the YouTube playlists as well, which are probably not that interesting). The result: around 90 releases, the results of which hav

Manufacturing date codes from Czechoslavakia

Another country that put dates on its releases is Czechoslovakia (I am using past tense, as the country no longer exists). On most releases from this country from the late 1960s - early 1990s you can find a so called "manufacturing date code", or something similar to that. The page for the Opus label on Discogs says: "Most Czechoslovak vinyl records pressed between ca. 1967 and 1992 include a three-digit code on the side A center label . The first two digits represent the year, the third digit the half-year of the pressing. For example, a record with code '75 2' on the label has been manufactured in the 2nd half of 1975." and goes on to explain that this code does not necessarily mean that the record was also released in that year, but that it could have been held back for whatever reason. But it does mean that it can be used for checking the release year to see if it is perhaps wrong (too early). so I did just that. I looked at a few things in t

Releases with incorrect tracklistings on Discogs (part 2)

Time to dig a bit deeper into the tracklistings, as I believe there is a bit more to the story than what I wrote about in an earlier post . One thing that I kept wondering about is: is this mostly a problem for vinyl releases or for cassettes? I adapted my scripts to also output the format so I can answer this question. As an extra I also checked for shellac records and 8 track cartridges, which also have sides. My scripts found 944 shellac records that possibly have a wrong tracklist. There are even more 8 track cartridges than shellac records with a possible tracklist issue: 1144. This means that the total amount of releases I found with possible tracklist problems is now 148,013. The shellac releases with possibly wrong tracklistings are distributed over the data as follows: Distribution of shellac records with a possibly wrong tracklist in the Discogs data. For 8 track cartridges it looks like this: Distribution of 8 track cartridges with a possibly wrong tracklist

Releases with incorrect tracklistings on Discogs (part 1)

The tracklisting is one of the most essential parts of a release in Discogs, but for quite a few people it is proving difficult to get it right. What I often see with new entries in the database is that they are copied from an already existing release (using the powerful but dangerous "copy to draft" functionality in Discogs) and that information is adapted, but not all of the information. For the tracklist I see for example that for vinyl records or cassettes an already existing entry of a CD is used as a template and that the tracklisting is not adapted. This is important: CDs have a single side, but vinyl records and cassettes have two sides (or are single sided). Yet it is very common to find vinyl releases or cassettes in Discogs with no indication what side the tracks are on. What happens is that some people then ask the submitter to fix it. So I wondered: how often does this really happen? It wouldn't be the first time for me to think "that has to be a hug

How many releases in Discogs are Christmas related?

Because it is the season I was wondering: how many releases in Discogs are Christmas related? So I decided to run a very simple and crude test and search tracklists for a few keywords (all lower case, for easier searching): christmas x-mas kerst (Dutch) weinachten (German) This is, of course, not a very good test and I could also have checked for specific song titles, or different languages, but I didn't have the time to research it in depth. I also did not check if every result is actually a 'real' Christmas song. So you should take this with a big heap of salt. German-language Christmas releases As it turns out German speaking people don't really like Christmas: just 31 releases have some reference to Weinachten. Maybe they don't have that many Christmas songs that actually mention it. Dutch-language Christmas releases The Dutch and Flemish (and the occassional low-German) are doing a lot better: a bit over 1130 releases. I roughly verified that these

Wrong months in the 'Released' field in Discogs

Time for me to go back to a very boring error in the Discogs database: release dates. In an earlier post I already wrote about weird data in the Released field (this is the date field) but that is not what I want to talk about now (as I have already done that). This time it is about something a lot simpler: wrong months in Discogs. Month equals 0 As it turns out there are quite a few releases in the data where the value of the month is no longer correct. Apparently in the past it was custom (or even mandatory) to have releases in full YYYY-MM-DD format, and if the month and date were not know the value 00 was inserted. This is no longer accepted and if you edit a release and the value of the month is 00 then the following error will be displayed: Error message for wrong month This happens only for older releases that haven't been updated for years. It can be argued that this is mostly a cosmetic bug as the old dates with month values of 00 will display just fine. But, t

ISRC codes in Discogs (part 2)

It is time to dive into ISRC codes a bit more. If you don't know what these codes are, then it is probably good to read part 1 first. In this article I will not really look at ISRC codes in Discogs (OK, just a tiny bit), but instead look at how to extract ISRC codes on a Linux machine using Python. For this I will use libdiscid and its Python bindings . What you need for this: a computer with a CD/DVD drive (which is getting surprisingly hard with laptops these days) a Linux distribution (in my case Fedora 27, but any recent Linux distribution will do) libdiscid with associated Python bindings (either Python 2 or Python 3) It is actually very simple and can be done from either the Python prompt or from a simple script. It is literally as simple as this (Python 3 code, I stored it in a file called isrc.py): import libdiscid disc = libdiscid.read() try:     track_isrcs = disc.track_isrcs     for i in range(0,len(track_isrcs)):         print("Track %d: %s" %

ISRC codes in Discogs (part 1)

One thing that I had never heard about is the International Standard Recording Code (ISRC). According to the Wikipedia page about ISRC (worth reading) it is actually already almost 30 years old. These codes are interesting for datamining for a few reasons: the year in which the code was assigned is embedded in it, even though there are some exceptions: apparently in early days the recording year was used. they are specific to a single recording These two characteristics make it an ideal piece of data for checks and comparisons. Sometimes the ISRC code is printed on releases, but sometimes also embedded in some of the metadata on CDs. I never knew about ISRC, as I don't have releases with the code (or at least, I never paid attention) and I never have had a (standalone) CD-player that displayed these codes by default. ISRC in Discogs The ISRC field is a relative new addition to the Barcode and Other Identifers (BaOI) section. In retrospect it might have been better to

Fixing wrong credits in Discogs (part 1)

For releases in Discogs it is customary that people add the credits for a release, like who wrote songs, produced the record, mastered the recording, and so on. In some cases this information can be important, as some reissues or versions of records have been mastered by different people (I don't have an example, but I remember bumping into this while browsing through releases). Discogs has published a list of credits and roles that can be used in the database. These are the only ones that should be used and when editing a release that has a different role an error saying that something "does not match the credit list" (or similar to that) will be shown and you are expected to fix it first. Since a friend saw this error very frequently it got me wondering how many releases in Discogs there are with a credit that is not on the official credit list. For my clean up scripts I created a script to extract the credits list (so I wouldn't need to hardcode it), added a

Social interactions on Discogs

One fascinating side to Discogs is the interaction with other people. As you are probably aware the whole database is crowdsourced, with people adding and improving (at least, that's the idea) data. At the moment I am writing this there are 388,098 users listed in the contributor ranking, meaning they contributed to one or more releases. That's a lot of people: taken together they would be the fourth largest city in the Netherlands, before Utrecht . This also means that you might meet a wide range of characters, from many different cultures, with different motivations, and different opinions about what is the correct behaviour on Discogs. One of my friends is doing a lot of janitor work on Discogs and he gets to see quite a few comments from friendly to annoyed, very angry, hostile, or even downright absurd. Luckily my friend has been hardened by years of online disputes and, as the saying goes: "If you can't stand the heat, get out of the kitchen!" so if th

Indian PKD codes in Discogs

Some countries require that music releases have certain identifiers on them. A well known example is the depĆ³sito legal identifier in Spain that I wrote about  several times before. But other countries have it similar things as well. One of them is India, which for new releases requires a so called "PKD". This code contains the month and year in which the release was made, or at least when it was packed (I have seen it being talked about as both a production date, as well as a packaging date). One of my friends at Discogs (gerjolp) looked into these codes a bit, but could not find many references, just a few forum threads and the exact meaning remains a bit unclear, also because sometimes the date seems to have been stamped instead of printed, and this is not clear from the data in the database. PKD codes in Discogs There are 132 releases from India in the latest Discogs dump where it is easy to find the PKD code in BaOI. There are likely many more releases in Disco

Releases under a Creative Commons license in Discogs (part 2)

It's time for a very short post. In the last few days I spent some more time digging into Creative Commons licensed releases in the Discogs database. In a previous post I already looked a bit into it, but it was far from complete and likely I missed a few releases that have content under Creative Commons, but that I didn't identify. One thing that I added to my scripts recently was to look for a very specific word that is used in many Creative Commons license statements, namely ShareAlike . I added this to my scripts , grabbed the latest datadump (which is newer than the one I used last time, and also contains data added in November 2017) and reran my scripts. The result is that I found 82 more releases than last time. It should be noted that in one release the Creative Commons statement was actually removed and 31 releases seem to be new. That leaves around 50 releases which have the ShareAlike statement, but which I didn't recognize before as a released licensed und

What happened in Discogs in November 2017? (part 1)

A new month, a new data dump...and new statistics! Before reading this post it might be good to read the post about what happened in Discogs in October 2017 . Release statistics For this blogpost I downloaded the latest datadump (the "December dump") containing data from November 1 to November 30 (inclusive). The previous dump file had 9,107,428 releases, the new dump file has 9,217,123 releases. That means 109,695 more releases in the database. 3,868,975 releases stayed the same 5,235,697 releases were changed 112,451 releases were added 2,756 releases were removed from the database 222 releases had status Draft , Deleted or Rejected 11 releases that were not Accepted were in both the November dump and the December dump  2 releases were moved from Draft to Accepted What immediately stands out is that there is an enormous amount of releases that has been changed compared to previous months, when it was about 10 times less . As I don't believe that Discog

How to better flag non-existent information in Discogs

I know a few collectors of vinyl records who can best be described as "completists" (although others would describe them as "completely nuts") who try to collect every variant of an album of a certain artist, no matter how small the difference. For these people it is very important to know about how to tell different releases apart from eachother. This is something that Discogs is currently unfortunately quite bad at, especially when it comes to indicating that certain information is not present on a release. Let's look at an example. In the 1980s Metallica released a few picture discs. Some of these picture discs were released with a barcode , and others without . For the collectors knowing whether or not a barcode is on the release really matters. Another example would be knowing if there are SID codes on a release. In an earlier blogpost about SID codes I wrote that SID codes can sometimes indicate that a CD was released after 1994. The SID codes are

How Discogs can prevent wrong data (part 2)

For people caring about correctness of data in Discogs, it sometimes seems like an uphill battle. Once an error has been introduced, it gets copied and spreads, and fixing it becomes almost impossible. It very much resembles fixing errors in the waterfall model of software engineering: stopping errors at the beginning is much easier than fixing later. One way errors are spreading is because of the "copy to draft" functionality in Discogs, where information from an existing release can be copied and serve as a template when adding a new variant of a release. Although extremely useful (it speeds up entering information) people not only copy the correct information, but also the errors, or they leave information that is irrelevant to a release and don't remove it (example: SID codes for vinyl or other releases ). When the new release in turn is used as a template the wrong information spreads further through the database. Detecting errors is quite trivial using some simp

SID codes (part 4)

I wanted to call this post "Return of the SID", but there is only so much nerdiness you can squeeze into one topic. But, I am going to talk about SID codes once again, so it is best to first read part 1 , part 2 and part 3 about SID codes if you are not familiar with them. SID codes are inherently tied to CDs, or CD-like media (DVD, Blu-Ray, and so on) and have not been used anywhere else. One thing I wondered: for how many releases in the Discogs database have SID codes been defined when it actually is a different format for which SID codes do not make any sense at all, such as vinyl, or cassettes? So I got the latest Discogs data dump (releases until November 1 2017), adapted my scripts , ran some tests and got quite interesting results: 332 vinyl records 151 cassettes 24 files (digital music files) 13 shellac discs 2 DCC releases 1 VHS release 1 Memory Stick release 1 Edison Disc Especially the Edison Disc made me chuckle, as it is such an ancient format, an