In this short series of articles on cross LIMS data migrations, we introduced the concept of object characterization and highlighted how this approach helps to reveal gaps and similarities between the two data models (Best Practices in Cross-LIMS Data Migration) and prepares the ground for the subsequent phase when data maps connecting source table columns to target table columns are built (Cross-LIMS Data Migration: Data Maps).
At the upcoming European LIMS Tour 2011 in October, we will showcase most of the topics that I covered during this series of articles. If you are interested in knowing more, seeing live demos and having the chance to provide your feedback, don’t miss this opportunity.
In this third and final article, I will focus on data movement. At this point, you know the final destination of your data to the finest details, and you know the path they will follow to get there; you need a means of transport to get them there reliably, efficiently and fast.
For these types of journeys, our data integration technology partner expressor happens to provide just the right vehicle for various reasons, but three are particularly prominent for LIMS data migration projects.
Chances are that your data migration project team will be composed of several engineers covering different roles (administrator, architect, developer, and etc…). The expressor development environment supports the definition of user- and role-based access to projects. In addition, project deliverables are kept under version control through subversion.
Our LABVANTAGE 6 data migration team spans over nine time-zones and accesses the expressor repository server located in our headquarter’s data center in Somerset, NJ (USA). The source (SQL*LIMS 5.x) and target (LABVANTAGE 6) databases instead are not shared; they run locally on each engineer’s virtual machine. This allows designers and developers to have complete control of their own test environment, avoiding any conflicts and clashes of data when database records start furiously flying from one instance to the other during testing.
Leveraging the DI Workbench
If you are considering using expressor for your LIMS data migration project, then you need the LABVANTAGE Data Integration Workbench. The concept behind this tool is described in the Data Integration page of our website and in blog articles by Jeff Vannest, the friend and colleague from Strategic Solutions who designed and developed this valuable software suite (Jeff is also a certified expressor developer).
One of the strengths of expressor is the ability to separate the data movement routines layer from the underlying data sources. It accomplishes this through a dictionary of terms and definitions that identify uniformly and consistently your source and target database objects. Through the adoption of this common lexicon, the data migration procedures can be made independent from the underlying databases and, for instance, copying the identification number of a sample from the source to the target can be as simple as writing:
Output.SAMPLE_ID = Input.SAMPLE_ID
regardless of how the actual tables and columns storing the id of the sample are named.
If the data models that are involved in your data migration project are of a manageable size, creating the expressor semantic universe for them is something you may consider undertaking through a manual process; expressor provides very convenient features to do so. But if your semantic universe is likely to be made of several thousands of metadata artifacts, you would be forever grateful to whoever allows you to accomplish the task in the fastest way possible.
This is where the DI Workbench comes into play. The tool will inspect the structure of relational tables of your database, extract column names and types, parse and tokenize them through smart comparison with a dictionary of natural language words, and progressively populate the semantic universe.
You will probably agree with me that one (if not the best) sign of quality of a product is a vendor’s adoption for its own business that it sells to the market. Would you change your opinion on SAP (the Product) if you knew that SAP (the Company) chose a different ERP for their corporate organization?
Not only did we at LABVANTAGE (the Company) adopt LABVANTAGE (the Product) as the platform of choice to implement some of our critical systems (as you will certainly know if you are a customer and access VantageCare regularly), but also the LABVANTAGE 6 data migration team happily included the DI Workbench in its migration development workflow, saving weeks of effort in the configuration of the expressor semantic universe for the migration project.
Moreover, while LABVANTAGE 6 is being developed, and small changes are introduced in the data model as new iterations are released, rather than implementing those changes into our shared expressor semantic universe, we found it much more convenient and less error prone to run the DI Workbench suite and have the new version of the universe created in not more than one hour.
Thanks Jeff Vannest, forever grateful!
If your LIMS system has been in production for several years, the volume of data you’ve been creating must be fairly sizable, especially if you are a 24×7 operation. This means that you may find it highly desirable to be able to move large data sets to the target LIMS instance as quickly as possible.
Expressor can process multiple streams of data (called channels) in parallel, and the number of channels you’ll need to use should be determined based on your requirements for performance, maximum tolerable downtime, and etc.; the higher the number of channels, the faster data will be transferred.
The number of channels also determines how you are going to design your migration procedures, which in expressor language is called “drawings.” A drawing designed for single channel processing is in fact different from its multi-channel equivalent.
The picture below shows a drawing which moves SQL*LIMS sample records to a LABVANTAGE instance, and the “partition-key” shape on the left is the block in charge of routing data through parallel streams.
In general, you will create a drawing for each table that has to be migrated (or a group of logically related tables), and then execute the drawings in a particular sequence. When defining the order of execution, make sure you are taking into account referential integrity among tables, or record inserts may fail due to foreign key constraints violations.
This short series of articles covered some of the best practices we developed while architecting the SQL*LIMS to LABVANTAGE data migration strategy.
I also described some of the tools we adopted to make the migration easy to validate, properly documented, reliable, flexible and scalable. In the attempt to avoid scaring or boring my audience too much, I skipped lots of technical details that might be relevant when planning and implementing your next LIMS data migration project. If you are reading these final lines, then it seems that I have some success in keeping attention span. Please feel free to contact me if you are interested in going deeper into the subject matter.
The first article opened with memories from my childhood. Strange how while blogging about data movement, I’m actually on the verge of moving to a new house, an event that introduces the occasion of disposing old stuff you don’t immediately need anymore but still want to keep stored somewhere for sentimental reasons or just because it may become useful again when your kids have their own children. This “somewhere” in my case is unused space in the basement of my parents’ house, which was rapidly becoming quite crammed and had to be rearranged. While moving dusty boxes around, I finally came across the old battery-powered map puzzle that has been waiting there for more than 25 years.