Sunday, February 20, 2011

DNA seen through the eyes of a coder

This has been around a while, but I just found it. Explanation by a gentleman named Bert Hubert about the similarities between genetics and computer systems, with the analogy of DNA to source code as a focal point. A very cool read: DNA seen through the eyes of a coder. It seems to have been written 9 years ago (in 2002), and I'm no molecular biologist, but I think there have been some interesting advanced in epigenetics since then. I wonder how that fits into the analogy?

Monday, February 14, 2011

ID numbers are not integers

Here at $work we use a numeric identifier called the UFID. It's an 8-digit string that uniquely identifies an individual related to the University. It's protected information, according to $policy, so we have to be careful how we treat it.

Note I called it an 8-digit string. Unfortunately, I continue to see databases where this identifier is stored as an integer merely because it looks numeric. That is to say, int rather than char. This makes me sad.

While it's possible to get into an involved academic discussion of why this is wrong, I'll just enumerate two simple rules for when to use a numeric type, such as integer:

  • If the data is going to be used for arithmetic or statistical functions such as mean.

  • If the data serves as a counter, including auto-increment primary keys.


Note the second case is really an exception, and in the strictest of sense should not be allowed either. But, in the spirit of pragmatism, it is easy enough to permit this very special, well-defined case without problems. What does cause problems is using an integer type for a string field. The most obvious problem is conversion from integer to string dropping the leading zeroes.

Yes, it's possible to instruct most databases to return the data with leading zeroes prepended even though it's an integer. That's an abomination. Not only that, but if your ORM "knows" this is an integer, its internal representation will probably ditch that padding. Now you have to make your code provide padding as well via sprintf or similar. Not very DRY.

The data is not integer to begin with, you should not have to shoe-horn it into a type to which it does not belong. What happens when one day they run out of IDs and start allowing letters in the ID? ...

Save yourself the worry. Store identifiers as strings.