Capturing and storing raw data is the first critical step in any research process.
Paper is a Thing of the Past
Technology for managing lab data has changed considerably — evolving from handwritten notes and electrophoresis gel photos into digital servers fed by connected instruments. It’s no longer a question of whether your lab will digitally transform, but how soon it will happen.
The basic concept of data management, however, has essentially remained unchanged. Everything from original data and images to researcher observations and annotations goes into some form of lab ‘notebook,’ which serves as a single central repository of facts, statistics, results, and other project information. This is a record of experimental work and results, often containing experimental methods, original data and even metadata required to contextualize results.
Formal metadata, experimental test parameters or control conditions, are often supplemented by researcher observations and annotations facilitating unique data analysis and interpretation. This combined, multi-source data often leads to the genesis of new ideas and discoveries.
Effectively capturing and cataloguing this information is critical to the researcher and others attempting to replicate the work and extract knowledge from the results.
The Electronic Notebook Future
Today’s researchers rely on Electronic Laboratory Notebooks (ELNs) rather than spiral-bound sheets of paper. In fact, digital practices are now so common in the industry that many instruments are more likely to have fiber optic cables, USB ports and complex graphics than printers or simple numeric readouts. ELNs have become essential merely to keep up with the vast amounts of data generated and collected by modern labs. These data sets are exponentially larger than in the past, move faster, and are exchanged across global networks.
As a result, powerful computing resources, often automated, are required to manage, analyze, and leverage findings into practical insights – much less marketable products. This process is often compared to finding a needle in a haystack — except the “haystack” could potentially be the size of one or more galaxies.
The challenges of multi-source data
New ideas and discoveries often result when data from different sources is captured and compared. But not every source handles data in the same way, and different types of data require different cataloging and digitizing strategies before they can be analyzed by today’s most sophisticated tools.
A typical ELN combines user interfaces with a centralized database and file storage. Done right, this system allows for seamless storage of lab observations, results of experiments, execution workflows, metadata and more in a streamlined digital format. The goal is to make analysis of the data in your ELN as simple as possible – ideally from the moment it is first captured in the system.
One of the biggest challenges to this goal is that data now comes in more forms than ever before. Depending on its source, not all data may immediately be usable in an analytics engine…or for that matter, even readable by a human eye.
Interpreting such data so it can be successfully captured and retained, may require one or both of the following strategies:
- Data transformation — Digitizing the data, or converting it into a different digital format, to make it comprehensible to whatever people, software or other systems will be working with it.
- Data cleansing —Editing or removing some of the data to refine it down to the relevant points needed for analysis.
Another complication that comes with Big Data is, of course, its sheer volume, making human data entry impractical for many tasks. The need to process vast amounts of data – often from sources outside the controlled conditions of your lab – typically makes some form of automation essential, both to handling the quantity of information and minimizing opportunities for human error when data is entered into the ELN.
Three types of data
ELN data sources are classified into three categories based on their structure:
· Structured data
This category refers to data that is already set up in standardized, machine-readable formats such as spreadsheets or databases. Structured data is the easiest form for an ELN to understand and work with. Some data cleansing might be desirable depending on the source, but in general it requires little or no processing to become usable when entered into the system.
· Unstructured data
At the other end of the spectrum is the most difficult type of data for an ELN environment. Unstructured data comes in formats that require some form of conversion or processing before they can be used.
Common examples include photographs and other images, maps, graphs and sequence-level data. Any data not yet in a digital format, such as printed reports or hand-written notes, also falls into this category.
· Semi-structured data
This classification combines elements of structured and unstructured data. In many cases this is some form of unstructured data with metadata, such as researcher observations or instrument data, attached to it. Although semi-structured data may be usable on its own, its value can be significantly improved by using analytical software to process the unstructured portions of the data when it is entered in the ELN.
Reaping the data harvest
Whatever category of data you’re working with, it will need to go through a process called extract, transform, load (ETL) before it can be analyzed. Using ETL, the data is collected, formatted to be readable by an application, and then stored for use.
A leading-edge ELN should have the capability to transform unstructured data resulting from experiments into semi-structured data. Such a transformation enables analytical methods, inferential analysis, and even predictive trending to take place in real time.
Although fully structured data can be directly added to any LIMS database, LabVantage ELN provides robust flexibility when dealing with semi-structured or unstructured data.
The LabVantage ELN is unique because it has the foundational support of an enterprise LIMS application. Semi-structured metadata and structured data is captured directly and seamlessly from instrument sources using LabVantage’s own Scientific Data Management System (SDMS).
Once the data is in a useable form inside LabVantage ELN, the user could choose to further manipulate the data or feed it directly into an analytics engine such as LabVantage Analytics. This helps not only to streamline daily lab activities, but also provides a single unified source of truth for patents, publications, and Investigational New Drug Applications to the FDA.
For example, LabVantage not only allows you to add attachments related to a particular worksheet for reference within the ELN, but (depending on the data type) you can open and capture attachment information directly within the worksheet itself – making information from the attachment far more accessible than in most other ELNs.
Working with big data is difficult.
Companies are faced with trying to combine and successfully integrate multiple data types and extrapolate meaningful results. An ELN, as part of a broader informatics solution, must be efficient and deliver streamlined information extraction. Critically, in today’s complex data environment, it must be available to store, access, and analyze large amounts of data at once in order to successfully reduce costs and speed decision-making.