In the Discogs database each release has a Released field that can contain either a year, a year plus month, or a year, month and day or nothing (if the date is unknown). At some point in time Discogs started to enforce the format for this field, but before then it was more or less a free text field. I looked at how often the Released field contained an invalid year: 1766 times, which is surprisingly low.
Some results that I found:
Checks that I made will soon be added to the cleanup scripts on GitHub.
Some results that I found:
- different date notations (DD-MM-YYYY, MM-DD-YYYY) with different separators (-, /, .)
- the word "unknown", in various forms, including misspellings
- a question mark
- whitespace
- multiple years
- month names
- combinations of the above
Checks that I made will soon be added to the cleanup scripts on GitHub.
Comments
Post a Comment