Skip to main content

Posts

Showing posts from March, 2019

What happened in Discogs in November 2018?

Making up for lost time: data for November 2018! As always it will be in the same format as for previous months . Release statistics I looked at the dump file with data covering November 1 - 30 2018. This dump file has 10,560,935 releases, whereas the previous one had 10,449,676 releases. That means 111,259 releases more. Of those: 9,976,683 releases stayed the same 470,852 releases were changed 113,400 releases were added 2,141 releases were removed from the database 233 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 2 releases were moved from Draft to Accepted Changes look like this. I find it amazing that every month the graph is almost exactly the same except for the last column:   Smells I found 2199 smells in newly added releases (ignoring tracklist errors) and they all look very similar as before: about half label codes, a quarter SID codes, and then the rest.

What happened in Discogs in October 2018?

Continuing with my catch up about Discogs: what happened in October 2018? If you haven't read it, please read the post about last month first. Release statistics I looked at the dump file with data covering October 1 - 31 2018. This dump file has 10,449,676 releases, whereas the previous one had 10,335,691 releases. That means 113,985 releases more. Of those: 9,845,113 releases stayed the same 488,348 releases were changed 116,215 releases were added 2,230 releases were removed from the database 163 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 0 releases were moved from Draft to Accepted The changes look like this: Smells It all looks very much like the other months: there were 2045 errors in the releases added in October, a bit over half of the errors are for label codes, a quarter SID code related, and the rest is the same as in previous month. Zzz.

What happened in Discogs in September 2018?

Continuing with data from September 2018: what happened then? If you don't already know the format, please read the previous post first. Release statistics So far the data has been very consistent and September 2018 is not an exception. I looked at the dump file with data covering September 1 - 30 2018. This dump file has 10,335,691 releases, whereas the previous one had 10,222,636 releases. That means 113,055 releases more. Of those: 9,727,367 releases stayed the same 491,702 releases were changed 116,622 releases were added 3,567 releases were removed from the database 164 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 2 releases were moved from Draft to Accepted And, again, the charts look very similar when compared to previous months: What I personally find interesting is that September is when Discogs has its SPIN (September pledge initiative) when people are encouraged to add more releases to the databas

Introducing new errors in old releases (part 2)

I was gone from Discogs for a while due to lack of time but also because I was a bit frustrated about the lack of quality of data in Discogs and the (seeming) unwillingness of Discogs to fix this (although it is also very likely that, like me, they are completely swamped and there are only so many hours in the day). One thing that keeps bugging me in Discogs is that people add or change information in such a way that they actually contain wrong information. I already looked at this problem once before and I was wondering if it is as bad as it was back then, or if things have gotten better in Discogs. A few unscientific tests didn't seem very promising so I grabbed the dumps of February and March 2019 and simply counted how many old releases were in the later dump, but not in the earlier. Ideally the errors found in the older releases are known errors, so for old releases this number should be 0 in the best case. But, that is not what is happening. As it turned out, quite a fe

Discogs data is open....mostly

A few times I have already talked about my love/hate relationship with Discogs. I love it because there is so much data, but I dislike the quality of the data. I like it that on the Discogs blog there are now finally frequent blog posts about how the database evolves, with pretty graphs (as I told some staff members they should consider), but I don't like that they are not looking at errors and increasing quality. Also, I don't like that they are not publishing the actual data sets used to generate those pictures, which brings me to the core of this post: What I love is that (most of) the data is available under a CC0 license (and others share this view ) but as I have ranted about before: this is not actually all of the data from the catalog (I am not interested in the sales data). Specifically, all of the historical edit information is missing , which potentially contains very valuable hints about how releases have evolved over time, which would allow people to dig int

What happened in Discogs in August 2018?

Quickly continuing with going through all of the Discogs data (well, not quickly, as it is a lot of data to process): what happened in August 2018? If you haven't already, please read what changed in July 2018 first. Release statistics The dump with all the changes from August 1 - August 31 2018 has 10,222,636 releases. The previous dump had 10,116,749 releases, so that means 105,887 releases more. Of those: 9,612,780 releases stayed the same 501,079 releases were changed 108,777 releases were added 2,890 releases were removed from the database 174 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 2 releases were moved from Draft to Accepted What is noticeable is that there are far fewer 'draft' releases this month. And, once again, the plotted graph of where the changes are made looks very similar to previous months: Smells Again ignoring tracklisting errors the amount of new releases with errors is

What happened in Discogs in July 2018?

I have some catch up to do with the data of last year, so I will try to keep it short and sweet. If you are new to these posts, what I do is that each month I compare the Discogs data dump with the data dump from the month before and see where things have changed, and also run a few sanity checks on the data to find if I can spot any obvious (or less obvious) mistakes. As an example, check out the previous post and then work your way back through history. Release statistics The dump with all the changes from July 1 - July 31 2018 has 10,116,749 releases. The previous dump had 10,010,587 releases, so that means 106,162 releases more. Of those: 9,510,902 releases stayed the same 497,113 releases were changed 108,734 releases were added 2,572 releases were removed from the database 329 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 3 releases were moved from Draft to Accepted Changes are, again, very similar as in months

What happened in Discogs in June 2018?

Wow. I just realised that it has been about 9 months since I last analyzed Discogs data. I got swamped by enormous amounts of work so I didn't have time to dive into this subject. Now that things are (temporarily) a bit slower (for a change) it is time to correct that and look into what happened in June 2018. If you don't know how this works, I suggest you read the post about May 2018 and work your way through history. Release statistics The latest dump (which I call "the July dump", as it was released then) covers all data added from June 1 -  June 30 2018 (inclusive). The previous dump had  9,906,032 releases, the new dump has 10,010,587 releases. That means 104,555 releases more in the database. Of those: 9,350,389 releases stayed the same 553,252 releases were changed 106, 946 releases were added 2,391 releases were removed from the database 296 releases had status Draft, Deleted or Rejected 11 releases that were not Accepted were in both dumps 2 rel