When it comes to behavior, I’m a conservationist and minimalist: I sincerely dislike having to do anything extra or wasted. For example, I dislike taking stairs one at a time because it’s too slow. I never print out anything; it’s a waste of paper that I’ll eventually throw away anyhow. I frequently run across parking lots because I can’t stand the time wasted between walking out a door and arriving at my car. So why would I advocate using a data warehouse as the basis for business intelligence and decision support? Why not get data directly from operational systems and skip the extra system?
One of the biggest misconceptions I encounter when talking to organizations about data warehousing is the idea that a data warehouse is where the operational data is collected. Using this definition, an additional data warehouse would absolutely be a waste of time and resources. Another misconception is that a data warehouse is just a bunch of systems data integrated into one system. While there’s kernel of truth to that idea, it misses the point.
So as a point of contrast, here is a list of what a data warehouse is not. It is
- not the primary operational data store,
- not a complete copy of operational data,
- not a source for in-process, operational decision making, and finally, it is
- not a composite operational system.
Although there are exceptions to every rule, the above list should clear the air of lingering misconceptions. So now that we know what it is not, what is a data warehouse, and what advantages does it provide over using an operational system directly for business intelligence and decision support?
Laboratory data systems are often based on very complex, third normal form relational database schemas. While there are good reasons to model complex systems using this method, these schemas are frequently too complex for the average business intelligence user and ad-hoc consumer. Therefore, a successful data warehouse solution will often remove the complexities of the operational store, relational objects might be flattened, multi-column foreign key indexes might be simplified to a single key column, ambiguous data (for example, textual facts) might be omitted, and column names might be simplified to provide clarity and standardize business vocabulary. Each of these changes helps improve the speed of the data warehouse database, makes the mining and summarization of data more user-friendly, and serves to harmonize data knowledge across the enterprise.
A successful data warehouse solution provides blazing fast access to key operational data that is frequently used as the basis for making business decisions, identifying historical trends, and predicting future performance. Because many laboratory systems are used 24×7, it is often unreasonable to assume that business intelligence queries that consider hundreds of thousands or millions of data points should be able to share the same resources as the operational system. Therefore, reporting against a data warehouse often frees the operational system from what could be an overwhelming additional burden and allows the data warehouse to focus solely on speedily summarizing large volumes of data.
Although I may have stretched the alliterative nature of my section titles, a successful data warehouse solution may create a composite view of conformed data from disparate operational sources. This means more than merely integrating data from several systems into a single database. Frequently a data warehouse will fill in missing fields with calculated values, convert null values to “N/A”, and have data converted into universal formats. Although most laboratories would never consider “synthesizing” data that is missing, this practice is perfectly acceptable – and preferred! – in a fully populated data warehouse. For example, if one source system is marking protocol subjects as Male/Female and another system is marking them as Man/Woman, the data warehouse would need to store all data using only one set of values in order for the data to be useful.
Although I have provided only an abbreviated list of advantages in favor of a data warehouse, it should be clear that implementing one in your organization may offer very real, pragmatic benefits over trying to pull business intelligence directly from operational source systems.