Simple and Composite Candidate Key and Nulls



  • One of the differences between a candidate key and a primary key is that "candidate keys can contain nulls." I have been unable to find a more precise definition of what this means. One explanation I have seen is that in a CK with only one attribute, exactly one value can be NULL. Presumably this is because that NULL can uniquely identify a tuple/row. Is this correct?

    What I have had even more difficulty finding is what happens in a candidate key that contains more than one attribute:

    1. I think I saw one reference state that each attribute can contain only one NULL.
    2. However, by definition of CK, it seems there can be any number of NULLs as long as the set of attributes in the CK still uniquely identifies a single tuple/row.

    Which one is correct, or are they both wrong?



  • It all gets a bit tricky because of three-valued logic. Comparing NULL to anything returns "unknown", not "same" or "different" but "unknown". This applies even when comparing two NULLs.

    How are we supposed to tell if we are about to create duplicates in a candidate key? Well, you have duplicates if the value you're trying to insert matches any value currently in the table. However, if what we are trying to insert is NULL you can only ever get the answer "unknown".

    One way to address this is to define that candidate keys may not contain nullable columns. That's legitimate. Another is to extend the definition of duplicate to "new value equals existing value or new value is null and an existing value is null." This is also legitimate. Neither is right or wrong, they are just two differing definitions which have been given the same name.

    At the start of the relational model Codd said each table must have a primary key and this is special. Later this was relaxed to say each table could have many candidate keys and none were more special than any other. By then, however, the idea of a special primary key was baked into the collective consciousness and several popular RDBMS products. And so we have today the condition you describe.

    For example, in Microsoft SQL Server a primary key constraint may not contain a nullable column. However a unique constraint my contain one or more nullable columns. Both constraints may be referenced by a foreign key definition.

    In conclusion, the distinction between a primary key and any other candidate key is artificial and retained because of the early history of relational databases. Whether or not to allow nullable columns in a key is implementation dependent. It depends on how worked up you want to get about 3-value logic or if you accept that a NULL can be treated as being the same thing as another NULL.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2