16th June, 2008

Your database will never be 3rd normal form.

Databases have incredibly complicated data structures. When working with a large database, the temptation is to over-engineer the layout. A completely normalised database is one particularly useful in production.

The data is an audit log.

Let's imagine you are writing a sales ledger. You have a table of customers and products. You also have a table of invoice headers containing:

Invoice number
Date
Customer ID

You create an invoice line-item table containing

InvoiceHeaderId
ProductId

From this you can calculate the price of the invoice...

Except you can't. What if you change the price of the product? The entire back-history of invoices becomes incorrect. Now you have to include the price in the line item, so basically keying your product item by ID, then DUPLICATING the products attributes in the line item.

InvoiceHeaderId
ProductId
ProductPrice
ProductTax
ProductTotal

Anywhere there is a record in the database that would be required in an audit must in effect snapshot the database state at that point. This means for any normalised database, you are storing double the data.

 

The opinions expressed here are my own and not those of my employer.