You may remember from your primary school days those puzzles that were used to stimulate the learning process by adding amusement to the teaching of subjects that required a certain degree of memorization, like geography for instance.
I recall a battery-powered interactive map showing marks of a certain geographical area with all names removed and haphazardly displayed in one side, little metal rods to connect each mark on the map to the right name in the list, closing the electrical circuit and switching a lamp or a buzzer on. I actually remember even building one featuring a map of Italy with regions and main cities for a lab assignment.
Architecting a migration of data from one LIMS to another reminded me of that school puzzle in that at the end of the endeavour a weave of wires will connect each and every column in the source data model to its target mate and thousands of imaginary blinking lamps will signal that data are flying unerringly to their final destinations.
Ok, I have completed my obligatory sentimental opening : let’s go back to the real world.
In this first blog post of a short series, I would like to share with you some of the best practices we adopted while architecting the data migration from SQL*LIMS to LABVANTAGE 6.
Semantic, syntactic and relational characterization
We are obviously assuming here that the source and the target data models exhibit a fair degree of kinship as they both belong to the big familiy tree of LIMS software applications. This means that if you compile a list of names of entities that can be defined and manipulated in each respective application, a lot of the terminology should be portable without translation: sample, specification, material, sample type, study are likely to be valid terms in both systems although the meaning of those objects will certainly be quite different in their respective contexts.
To reveal what that meaning really is, each object can be characterized according to three perspectives:
- a semantic perspective where objects are characterized based on their defining properties, e.g. a “sample” is characterized by a unique identifier, an optional description, the date when it was created etc.
- a syntactic perspective where objects are characterized by which operations can be performed on them, possibly in combination with other objects, in order to create valid process flows, e.g. a “sample” can be logged, received, assigned to a location, attached to “tests” etc.
- a relational perspective where objects are characterized by other objects they can relate to, e.g. “sample” is related to a “submission”, “child samples”, “test methods” and “results”. Chances are that these relationships are also implemented in the database by means of explicit (foreign keys as in LABVANTAGE) or implicit (as in SQL*LIMS) constraints between tables.
Characterization can be conveniently represented using affinity diagrams by means of which you can create clusters of properties around objects and rank similarity among them by counting the number of overlapping traits.
As you may expect, in some cases equivalence would emerge quite naturally: for instance an object called “sample” in the source LIMS will probably have its obvious counterpart in the namesake entity belonging to the target LIMS.
In other cases the correspondence is not immediately apparent. For example, SQL*LIMS features an entity called “submission” but this term does not exist in the LABVANTAGE dictionary of object names . A simple characterization of this entity in SQL*LIMS could be:
- semantic: 1) a “submissions” is characterized by a system generated unique identifier and a user-defined description
- syntactic: 1) it is created by means of a “log” operation, 2)”samples” can be added to an existing “request”
- relational: 1) it is related to “samples” in a 1:many relationship
A LABVANTAGE entity that features these properties actually exists and is called “request”. It seems we found our match.
Prioritize the gaps
While establishing cross-LIMS equivalence among entities, gaps will start to emerge.
For one, there might be cases where a correspondence cannot be even established: some objects in the source system may not have any meaningful equivalent in the target. Even when you own the target application and are in complete control of its development strategy, this is obviously a critical gap because in order to bridge it you may need to architect and code an entirely new class of objects and feature sets that don’t exist in the target application.
Also, source objects that do have counterparts in the target system will most likely only partially coincide and discrepancies may appear under any of the above mentioned three perspectives.
The decision of how to prioritize the gaps could be based on considerations around:
- Effort: how expensive it is to build the missing entities or largely overhaul existing ones? What will be the cost of the employed resources and what is the impact of deliverying a feature-rich migration process well beyond the deadline?
- Opportunity: what is the impact of “injecting” new entities into the architecture of the target LIMS? How disruptive or advantageous is it? Can you spot further developments? Are you really building a better LIMS by doing so?
- Usage: how widely spread is the adoption of that particular entity or feature within the user base of the source LIMS? Perhaps it is a fancy piece of functionality that nobody actually used and you can spare yourself the hassle to include it in the scope of your migration. To assess this you may consider conducting interviews or surveys and use the results to support your business case.
In the next article of the series we’ll move to a lower level of abstraction and we’ll talk about how to create database transformation maps.
I’m hit by a wave of nostalgia now: let me go to my parents’ house basement and see if I can find that battery-powered map puzzle.