Date Formats

There is just so much wrong with all these date formats out there. Let me tell you what I think is wrong with almost all of them.

For starters, consider 3/1/11. Is that January or March? If you use the M/D/Y format all the time, you would call it March (2011-03-01). On the other hand, I have seen this as 3.1.11. The average German would consider this the to be at the beginning of the year (2011-01-03), but the American person writing this actually meant March with this: 2011-03-01.

But actually, nothing is stopping you from reading this as 2003-01-11 or 2001-03-11 or something like that.

Using a four digit year is a start, then only day and month can be confused. If you have 1/12/2003, a confusion could still be pretty bad as one is the beginning of the year and the other refers to the end of it.

One way out of this is to write the month in letters and the year with four digits, like in 01-Mar-2011. This is unambiguous, but it can neither be compared (and therefore sorted) nor parsed easily. And there are people who have different names for their months, so they have to learn the twelve English words as well. And it invites another thing that I will come to shortly: Translation.

And using a two digit year might seem very clever to save two spaces, but it can be really a problem. You would think that 99 would be clearly 1999 and 03 would be 2003 of course. But what about 70? Well, 1970, right? Okay, what about 30, is that 1930 or 2030 already? You see, "saving" two spaces really opens up a lot here.

Another thing people like to do is to omit the year. That is great if you want to specify an event that is within the next view days or months. But imagine you have a website, where you tell everybody that the next meeting is on 4/7. Given that your readers can read your mind to know whether it is April or July, they might show up this year. Then you do not alter your website for the next year--say you are too busy. Then next year, people will think that it is this year.

I did not make this up, I had the exact thing on a website I consulted for the dates considering university enrollment.

A Whole Zoo

Choices are great, right? If you want to learn programming and want to have some hard problem, try to build a date parser. If you use my favorite date format, you just need to parse it through (\d{4})-(\d{2})-(\d{2}) and you got year, month and day right away. But considering all those nifty date formats people came up with, this is a true challenge.

I often read the American and the German date, which can be pretty hard on the mind to switch back and forth, especially since I use my favorite one for everything else.

This is a list of date formats that I came along:

  • 01-03-2011
  • 01-Mar-11
  • 01-Mar-2011
  • 01.03.11
  • 01.03.2011
  • 01.03
  • 03/01/11
  • 03/01/2011
  • 03/01
  • 1.3.11
  • 1.3.2011
  • 1.3
  • 11-Mar-01
  • 110301
  • 2011-03-01 (ISO 8601)
  • 2011-Mar-01
  • 2011-Mär-01
  • 20110301
  • 20110301
  • 3/1/11
  • 3/1/2011
  • 3/1
  • Tue, 01 Mar 11 (RFC 1036, RFC 822)
  • Tue, 01 Mar 2011 (RFC 1123, RFC 2822)
  • Tue, March 1, 11
  • Tue, March 1, 2011
  • Tuesday, 01-Mar-11 (RFC 850)
  • Tuesday, March 1, 11
  • Tuesday, March 1, 2011

So, tell me the expiration date of this food:

The text in the picture reads: "Best before: see Stamp. 10042014B"

My guess is 2014-04-10, that would match the purchase date of around 2013-05-01 and that it should be good for a year.

Translation

Everybody in the world knows English, right? Between English, German and Dutch, the dates would not be too different, say "May", "Mai" and "mei" or "June", "Juni" and "juni" (which all becomes Jun in the short form anyway).

So you would say that this does not matter too much? Let's come back to that parser to be written. Would you really want to build all languages into there? And which languages get included in there? I read that there are some 5000 languages around, so you would need to pick some. And what would you do if something means two different months in two languages? Then there is no way to get it straight.

And everybody would have to know every language that the dates are presented in. Sure, you say, you only exchanges dates with people within your language group. But why create a different date format for every language when numbers can be universal, and not need any translation at all?

Comparability (Sorting)

If you ever named files with a time stamp in front and tried to sort them, you might end up with something like this:

  • 1.3.11
  • 13.3.11
  • 2.3.10
  • 2.3.11
  • 2.4.11
  • 3.2.99
  • 4.3.11

That is just awesomely wrong. Sure, there are file managers that recognize dates and sort them the chronological way. But this is merely a fix of the already broken date format.

If you would have used any date format where the fields are fixed length and the entities are sorted from big units to small, then it works just fine. Same list, in ISO 8601 format:

  • 1999-02-03
  • 2010-03-02
  • 2011-03-01
  • 2011-03-02
  • 2011-03-04
  • 2011-03-12
  • 2011-04-02

Quite a difference. And this works even in the most primitive C locale.

Comparison With Time Formats

With time within the day, it is canonical to just write hour:minute:second. But the hour is not defined too well, it could be 24- or 12-hour format.

So, is 12AM midnight or noon? I think it a little unintuitive. During the morning, you count up, 10AM, 11AM. Then, all the sudden, you go to PM. But you still count to 12. So it is 12PM. The next one is 1, since we cannot go over 12. So it is 1PM.

So basically:

10AM → 11AM → 12PM → 1PM → ... → 10PM → 11PM → 12AM → 1AM

If that does not convince you, then think about what "Let's meet at six." means to you. Is that the morning, or the evening? If you always sleep till 9 (as in 24-hours) then you might pick 10 o'clock as the boundary case. So, how would you interpret it?

Sorting does work not really well with it to, since 3:14 and 15:14 are supposed to mean the same, but are at two different places.

When writing the time, everybody uses hour:minute:second, there is no minute:hour:second or whatever. And they are in the correct order in order to sort them. One could write second:minute:hour like the regular German date, but somehow people think it is pointless. I do not really understand why, it is just as pointless as their date format, and that seems fine to them.