2016-03-25

Names That Break Databases

Database programmers frequently make false assumptions and compromises about names. Back when disk space was horrendously expensive, it was often for disk consumption reasons: If you allocated 15 characters for a last name, it consumed 15*numberOfRecords amount of bytes of disk space (not even considering non-ASCII names). The average last name in the U.S. is six characters, and almost all are less than 13, so 15 should be enough, right? These early systems also often didn't allow for variable length names. The person who ran the department for one of my first programming jobs had a hyphenated last name with 16 characters in it, and hated having it truncated. We bought more disk space just to handle that name.

You may think this all went away with the fact that disk space has become extremely cheap. (This dates me a little: My first 1 GB drive cost over $1000. Think about how expensive your cell phone, much less a terabyte+ database, would be at that rate!) Cheap disk space and variable length database fields have made things better, but database programmers still make many wrong assumptions and compromises regarding names.

Think about this: What if one of your customer's last name was Null? Would your system break? If so, it probably has other problems and is possibly open to a SQL Injection Attack. One of my favorite XKCD comics (If you don't understand why this is funny, see SQL Injection Attack on Wikipedia):


(In some places, because of this comic, SQL Injection Attacks are called Little Bobby Tables Attacks. I like this comic so much, I got one signed by Randall Monroe for my wife, who is an Oracle Database Administrator, for her desk at work.)

[General rule: Do not concatenate SQL with user supplied data...use parameters instead!]

This article discusses the many problems that people named Null run into: The names that break computer systems.

There are many other assumptions about names that turn out to be false. This is my favorite article about them: Falsehoods programmers believe about names

This is the start of the list from the article:
  1. People have exactly one canonical full name.
  2. People have exactly one full name which they go by.
  3. People have, at this point in time, exactly one canonical full name.
  4. People have, at this point in time, one full name which they go by.
  5. People have exactly N names, for any value of N.
  6. People’s names fit within a certain defined amount of space.
  7. People’s names do not change.
  8. People’s names change, but only at a certain enumerated set of events.
  9. ...
Take care that your system handles as many of these as possible. Almost all database systems have to make compromises to handle names, but they don't need to break just because they hit a name like 'Null'.

No comments :

Post a Comment

Note: Only a member of this blog may post a comment.