There are absolutes that are true of data modeling and architecture, but these are fewer in number than most people think. There is a liberal use of the words “always” and “never” handed out as technical advice, and while it is usually well-meaning, can lead to a design myopia that limits one’s ability to adapt to atypical application needs.
“You should always have a restorable backup for your production databases.” It would be hard to find anyone to argue a counterpoint to that statement. Similarly, a declaration that all source code should be stored in some form of source control is a generally accepted truism for any data project (or any other initiative based on code, for that matter). Most such absolutes are broad and generalized, and are applicable regardless of architecture, operating system, deployment platform (cloud or on-prem), or geographic location.
It’s much more rare to find absolutes that apply to specific design principles. However, that doesn’t keep some folks from incorrectly asserting absolutes. As I wrote in a post last year entitled Technical Dogma, we are creatures of habit and tend to favor tools or solutions we already know. This tendency coupled with repetition leads to a sort of muscle memory in which we become loyal – sometimes to a fault – to the methods we prefer to build things.
Designing in Absolutes
When we assume that a particular way of doing things is the only way to do it, we make assertions such as the following:
- Every dimensional design should be built as a star schema. There are no valid reasons to build a snowflake schema.
- You should never use the T-SQL MERGE statement to load data.
- Anything with more than a terabyte of data belongs on premises, not in the cloud.
- I’ll never use ETL again. Big data tools can do everything ETL can, and more.
- Database triggers should never be used.
These aren’t anecdotal examples. I’ve heard every one of these recently. To be fair, those who declare such preferences to be truisms rarely do so with nefarious intent, but such statements can have negative consequences. Building a solution with the assumption that a particular design pattern must always be used is risky, as it can lead to an inflexible solution that does not account for nuances of the particular application.
When I write about best practices, I am very cautious about speaking in absolutes. Even in my ETL Best Practices series, which represents my experience at having built hundreds of ETL processes over the past decade, I generously use the terms “usually”, “typically”, and “with few exceptions”. I do so not out of a fear to commit, but to be as accurate as possible. As with any other collection of best practices, there will be exceptions and edge cases which may seem to violate one of the principles of a typical design, but are entirely appropriate for some less-common design patterns. Providing the business with the data it needs, not the adherence to a particular set of design patterns, is the ultimate measure of success for any data project.
There are some absolute always-or-never cases in solution design. However, these are few in number and typically vague. Try to focus less on what should always (or never) be true, and more on the needs and nuances of the project at hand.
~~
This post was originally published in my Data Geek Newsletter.