BLOG POST:

How to Solve Laboratory Data Challenges

Data is the beating heart of any laboratory operation, whether you have just one site or multiple labs spanning the globe. But a key element of your success is how you store and manage that data. How you bring all that information together can mean the difference between a lab that runs like clockwork or an organizational nightmare.

A key in a keyboard that says "Raw Data"

To illustrate this, let’s take a look at a typical data storage setup, and the common problems it creates.

Three flavors of instruments

Whether you have many labs or just one, there are likely to be multiple spaces where analyses are performed. Each process will involve one or more instruments, which can be broadly grouped into three types:

1. “Dumb” instruments

These devices don’t have a software solution or a dedicated PC. They typically connect directly to a LIMS, either through an RS-232 serial connection or a network-based interface. In some cases, a converter box is required to complete a network bed interface. Balances or pH meters are common examples of this type.

2. File-based Instruments

One rung higher on the technological ladder, file-based instruments can be hooked up to a PC that collects the data they create. The PC can also do additional manipulation of that data using software provided by the vendor. The data can then be exported to a local or network drive for further use.

The data generated by file-based instruments comes in two formats:

  • Human readable — Data which doesn’t require any special software for a researcher to interpret it. It can be viewed with common applications found on most PCs, such as Windows Notepad.
  • Not human readable — This format requires the vendor’s proprietary software to view and/or manipulate the data.

3. “Smart” Instruments

These modern instruments can share data via multiple technologies, including streaming, various file-based formats, web services, or a vendor’s own proprietary integration solution. Larger enterprise solutions, in which one software solution can handle multiple instruments at the same time, also fall into this category.

Examples of these instrument types include chromatography data systems (CDS), such as Empower or Chromeleon.

Multiple Instruments & Raw Data Storage Challenges

In a common lab setup, “dumb” instruments will be directly linked to the LIMS. The results from those instruments can be stored directly against the right results, but their raw data doesn’t get stored in the Scientific Data Lake.

The other two categories can store their results as files. These might include:

  • A local file directory
  • A shared network folder
  • A cloud-based solution, such as AWS or Azure cloud storage.

The trouble with managing multiple instruments in this way is that raw data ends up getting stored in many different locations. This can create quite a few challenges:

  • Maintaining multiple locations: IT needs to maintain the locations separately, with different solutions for backing up and restoring data.
  • Length of storage: Many labs need to maintain their data for 30 years or more.
  • Data loss: Instruments will be replaced over time, which can cause data loss. Or local PCs containing raw data can break down — another common source of data loss.
  • Accessibility: Other locations that need certain raw data might not be able to access it.

The Scientific Data Lake

3D Letters that say "Data Lake"

A central storage location, such as a Scientific Data Lake, creates a single location for all raw data, making locally stored files, tied to the instruments themselves, obsolete. It becomes a primary location where data can be retrieved from any location within the laboratory – or company.

The Scientific Data Lake can monitor each instrument’s local storage folder and acquire the data that’s stored there. During acquisition, key information can be extracted from the raw data and stored separately in a database, creating key-value pairs that can be used to search for data in the Scientific Data Lake.

The benefits of a Scientific Data Lake include:

  • Simple maintenance: IT needs to maintain only one location when it comes to backing up and restoring data.
  • Accessibility: All raw data can be accessed by different laboratories anywhere in the company.
  • Minimize data loss: The Scientific Data Lake becomes the primary location for all data, resulting in less data loss.
  • Avoid test duplication: Making the data available to everyone in your operation limits the duplication of analyses.

More than just storage

A Scientific Data Lake can deliver huge benefits simply by streamlining data storage. However, an additional step can make it even more efficient. Why not connect raw data within the LIMS system directly to the samples created?

Here’s how this works:

  • Raw data generated during analyses will contain sample IDs, instrument IDs and other unique identifiers of objects currently in LIMS.
  • Whenever data is generated and stored in the Scientific Data Lake, the key value pairs can be used to make sure raw data is directly connected to the records in the LIMS system. Thus, whenever a sample is viewed in LIMS, the corresponding raw data (available from the Data Lake) can be accessed from that record.
  • All raw data generated within the laboratory is no longer stored in one large database (i.e., the Scientific Data Lake). Instead, LIMS creates a hierarchy linking all data to the correct records. Examples of this could include the following:
    • Show all raw data associated with a certain sample
    • Show all raw data acquired by a certain instrument
    • A complete Stability Study can be summed up, showing all of its raw data in an output from the LIMS
    • And many other possibilities…

The LabVantage LIMS advantage

LabVantage LIMS can provide a fully integrated Scientific Data Management System (SDMS) out of the box, creating a convenient central storage location. The SDMS will monitor the locations to which instruments send their data, interpreting and storing the data they provide.

The following storage options are available:

  • Network file location
  • Database storage
  • Cloud storage, such as AWS and Azure

All these locations are accessible only through the LIMS, ensuring the link between the data and the captures in LIMS are not broken. If additional analysis is needed, data can be downloaded from the LIMS again.

Since the LIMS is a central location for the data, it automatically makes the SDMS the central location for raw data.

For more details on this process, and how a SDMS can transform your lab operation, contact us today.