Chemicals & Materials Now!

From basic to specialty, and everything in between

Select category
Search this blog

Beyond The Hype: Big Data In Drug Development And Healthcare

Posted on August 30th, 2016 by in Chemical R&D

Sina's data image

The futurist Alvin Toffler published an article in the 1965 summer issue of Horizon magazine titled “The Future as a Way of Life”. The article grew into a best selling book called Future Shock in 1970.  Toffler’s short definition of the term “future shock” was a personal perception of “too much change in too short a period of time”. Little did he know what was to come in the 21st Century.  Phrases, known as hype, are coined everyday to capture the real and perceived fast coming technological changes.  So much so that Gartner Inc. has mustered fame for the “Gartner Hype Cycle” it publishes annually.  Of course ‘Big Data’ was once on the hype cycle, although it is not entirely clear who invented the term Big Data [Ref: Steve Lohr, The Origins of ‘Big Data’: An Etymological Detective Story, by February 1, 2013].

Figure 1 shows Gartner’s August 2015 Hype Cycle.  To survive, i.e. to become a commercial reality, a hype must pass through the “peak of inflated expectations” and not surprisingly the “trough of disillusionment”.  The few survivors face a slow march up the “slope of enlightenment” hoping to eventually reach “enduring productivity”.  Google Flu Trends, Autonomous Vehicles, Internet of Things, Wearables and Consumer 3D Printing are poised towards the Trough of Disillusionment.  Autonomous Vehicles fit Toffler’s definition of Future Shock well.  Forgotten are once-abuzz Augmented Reality (a la Google Glass), Mobile Wallets, Siri and Fingerprint Identification, all devoured by the cruel Trough of Disillusionment.  But not Big Data!

Gartner dropped Big Data from its hype cycle in 2015.  According to Betsy Burton the Gartner hype analyst: “Big Data has quickly moved over the Peak of Inflated Expectations,” she continues, “…and has become prevalent in our lives across many hype cycles. So Big Data has become a part of many hype cycles.” [Ref: A. Woodie, “Why Gartner Dropped Big Data Off the Hype Curve, August 26, 2015,].  Perhaps –but Big Data has actually found legitimate uses in a few fields especially drug development [Ref Improving Pharmaceutical & Life Sciences Performance with Big Data, Oracle Enterprise Architecture White Paper, Feb 2015]. Big Data is hardly just another hype.

Hype cycle for emerging technologies


Figure 1 Hype Cycle for Emerging Technologies, 2015

What is Big Data?

Even though the word big implies very large, Big Data is not simply defined by volume.  Rather it is about complexity. Many small data sets are considered Big Data and do not consume much physical space but are particularly complex in nature.  At the same time, large data sets that require significant physical space may not be complex enough to be considered Big Data [Ref: D. Alemayehu and M. L. Berger, Big Data: transforming drug development and health policy decision making, Health Serv Outcomes Res Method, Mar 5, 2016].

In addition to Volume, the Big Data label also includes data Variety and Velocity making up the three V’s of Big Data – Volume, Variety and Velocity. Variety refers to the different types of structured and unstructured data that organizations can collect, such as transaction-level data, video, and audio, or text and log files. The Velocity is an indication of how quickly the data can be made available for analysis.

In addition to the three V’s, some add a fourth to the Big Data definition. Veracity is an indication of data integrity and the ability for an organization to trust the data and be able to confidently use it to make crucial decisions [Ref: D. Alemayehu and M. L. Berger, Big Data: transforming drug development and health policy decision making, Health Serv Outcomes Res Method, Mar 5, 2016].

Organizations such as drug companies are increasingly turning to Big Data to discover new drugs by finding new ways to improve decision-making, opportunities, and overall performance.  The difficulty is while excellent systems exist to analyze different data types in isolation, real value can be gained from integrating the data into one harmonized, unified knowledge base.  However, this is where the issues begin. Different data types are stored in different data sources, and these data sources are not necessarily compatible. Data can be structured (as in clinical trial management systems or electronic data capture systems) or completely unstructured (such as free-text documents or patient-reported outcomes posted on social media like Twitter). Even if all the data are structured, the structure of one data source may not necessarily be compatible with that of another data source [Ref: P. Tormay, Big Data in Pharmaceutical R&D: Creating a Sustainable R&D Engine, Pharm Med 29:87–92, 2015].

What is Big Data Analytics?

Big data analytics is the process of examining large data sets containing a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information.  The primary goal of Big Data analytics is to help companies make more informed business decisions by enabling data scientists, predictive modelers and other analytics professionals to analyze large volumes of transaction data, as well as other forms of data that may be untapped by conventional business intelligence programs.

Why Big Data and Drug Development?

In contrast to the existing paradigm of drug development that relies on systematically collected numeric data, the new reality involves information that come in diverse forms and shapes. There are specific reasons for the adoption of the Big Data approach by drug companies.  A key reason is the cost of developing a new drug.  According to a March 2016 report by the Tufts Center for the Study of Drug Development a single approved compound costs an astounding $2.87 billion to develop and maintain over its life. Only blockbuster drugs could be expected to generate attractive returns.  Indeed,  return on investment has discouraged development of new drugs for diseases and conditions impacting small patient populations.  Besides the whopping price tag the pace of drug development has slowed down.  Alternative approaches for development of new drugs are sought because of the high cost, the need to develop new drugs and to enhance the benefits and precision of drugs.

In the context of drug development, Big Data means not only electronic health records, claims data but also data captured through every conceivable medium, including Social Media, Internet search, wearable devices, video streams, and personal genomic services; it may also include data collected from randomized controlled clinical trials, particularly when dealing with high dimensional data, including genomic, laboratory, or imaging data.

Big Data is poised to play a critical role, providing the opportunity to analyze diverse digitized health data to make decisions about the relative benefits of drugs, and their use in the real-world setting. Notably, evidence from observational studies may help to fill information gap for which there is inadequate data from clinical trials [Ref: D. Alemayehu and M. L. Berger, Big Data: transforming drug development and health policy decision making, Health Serv Outcomes Res Method, Mar 5, 2016].

The Future

The McKinsey Global Institute has proposed a glimpse of future where the following might be possible:

  1. Predictive modeling of biological processes and drugs becomes significantly more sophisticated and widespread. By leveraging the diversity of available molecular and clinical data, predictive modeling could help identify new potential-candidate molecules with a high probability of being successfully developed into drugs that act on biological targets safely and effectively.
  1. Patients are identified to enroll in clinical trials based on more sources—for example, social media—than doctors’ visits. Furthermore, the criteria for including patients in a trial could take significantly more factors (for instance, genetic information) into account to target specific populations, thereby enabling trials that are smaller, shorter, less expensive, and more powerful.
  1. Trials are monitored in real time to rapidly identify safety or operational signals requiring action to avoid significant and potentially costly issues such as adverse events and unnecessary delays.
  1. Instead of rigid data silos that are difficult to exploit, data are captured electronically and flow easily between functions, for example, discovery and clinical development, as well as to external partners, for instance, physicians and contract research organizations (CROs). This easy flow is essential for powering the real-time and predictive analytics that generate business value.

In the future, one can imagine that a practitioner will have real-time access to a database that will allow her to rapidly analyze how subsets of patients similar to the one in the examination room (e.g., demographically, principal health problem, general health status, etc.) have been treated and what were their outcomes. This will inform shared decision making by patients and their providers. However, the use of rapid-cycle analytics at the bedside remains controversial today. Nonetheless, in the absence of good insights from the published clinical trials, providers will increasingly seek them from accessible real world databases [Ref: D. Alemayehu and M. L. Berger, Big Data: transforming drug development and health policy decision making, Health Serv Outcomes Res Method, Mar 5, 2016].

Will Big Data have a significant beneficial impact on reducing drug development cost?  The best that can be said is “There is as yet insufficient data for a meaningful answer.” Isaac Asimov, The Last Question.


All opinions shared in this post are the author’s own.

R&D Solutions for Chemicals & Materials

We're happy to discuss your needs and show you how Elsevier's Solution can help.

Contact Sales