Curating Knowledge in the Digital Age: Mining Massive Data Sets for Healthcare Insights

In Healthcare’s Age of Liquid Data, we explored why the healthcare industry must move aggressively to make healthcare data extremely connected. We identified three steps to make this happen – the industry must aggregate data, curate data and use the data to engage consumers in a personalized manner to harness the power of the interconnected data. This article focuses on the importance of curating massive data sets to generate actionable data insights.

3 steps to make healthcare data extremely connected




As the volume of data expands, organizations must collect it systematically and securely while enabling its accessibility. These three steps can transform your data.

Data and Value

Value-based care is a data-driven enterprise. As healthcare organizations expand their payment and care delivery models to embrace value through population health strategies, they increasingly rely on data to optimize organizational performance, engage members/patients and enhance care quality and outcomes.

To effectively transition to value-based care delivery and population health
management, healthcare organizations must engage with data in new and powerful ways.

Once massive data sets from multiple sources have been aggregated and standardized into a single source of truth, hospitals and health systems can start to pull valuable insights and trends from the data.  Many healthcare organizations may lack these data curation capabilities to derive powerful and actionable insights. By framing data curation as a three-stage process, organizations can blueprint the workflow, capabilities and technologies necessary to mine data insights consistently and effectively.

The three-stage process consists of the following components:

3 Stages of Data Curation

Grouping Data into Meaningful Units of Analysis

Determining Current/Future Care Utilization and Financial Risk

Targeting Opportunities to Improve Care Quality and Organizational Performance

Stage 1: Grouping Data into Meaningful Units of Analysis

Health systems can collect multi-terabytes of clinical, claims, environmental and social data on individual consumers. These massive, complex and highly detailed data sets are not useful until grouped into meaningful units of analysis.

Consider a patient with diabetes, Jane Smith.  In her consultation, Jane will likely undergo diagnostic testing and receive a diagnosis and care plan that requires therapy and prescription drugs. These activities generate data from physicians, labs, pharmacists and therapists as well as payment claims. In addition, these data sources combine with Smith’s historic data in her electronic health record (EHR), as well as with disease-specific, payment and personal data related to her condition and the broader population.

By segmenting and grouping that data appropriately, analysts can dig deeper and ask the right questions: How has she responded to specific treatments? What does a year in the life of her chronic condition look like? How does she compare to thousands of other patients with a similar condition? Each look requires a specific analysis.

To assist in these grouping efforts, healthcare organizations may deploy:

Analytic Tools
Analytic tools enable the organization to establish the level of disease burden by assessing individual disease markers. This informs the predictive models and business intelligence tools that analysts apply to produce actionable insights.

Artificial Intelligence (AI)
Data aggregation and standardization can be replete with errors. The claims system may not recognize the pharmacy or procedure code. A diagnosis might be missing. Treatment codes may be incompatible with the condition or diagnosis. Augmented intelligence advances data quality. Clinical teams review application-generated feedback to identify which codes have created errors, then adjust the application to increase accuracy for both treatment and billing.

Context Natural Language Processing (NLP)
This capability is even more relevant with the rise in computing power and reliance on voice interaction with computers. Advances in Context NLP also enable analysts to extract data from large bodies of unstructured text and written documents, scans and images.

Stage 2: Determining Current/Future Care Utilization and Financial Risk

Grouping tools sort and mark all data related to Jane Smith’s diabetes. To assess Jane’s total disease burden, analysts combine this disease-specific information with information related to comorbidities. Jane is not alone – there are likely many other patients with a similar health status and care needs.

Effective value-based care and population health requires understanding current care utilization and costs and developing the ability to predict and manage future care utilizations and costs. What resources do patients like Jane actually require? What is the likelihood that those resource needs will increase exponentially next year?

An understanding of baseline and projected needs is the foundation for lowering costs and improving outcomes within risk-based contracts. This type of analysis can be generated through:

Business Intelligence
Business intelligence tools convert meaningful units of analysis into actionable insights. Assessing disease markers identifies the prevalence of a disease and calculates associated care utilization and costs.

Predictive Analytics
Based on business intelligence reports, predictive models determine which patients/members will have the heaviest future disease burden, and which would benefit most from active care management and/or disease management interventions.

Stage 3: Targeting Opportunities to Improve Care Quality and Performance

With a solid understanding of the projected disease burden, treatments and aligned costs for Jane and her population cohort, healthcare organizations can determine the appropriate targeted interventions to advance care quality and outcomes while reducing costs. The overall goal is to optimize care delivery, resource utilization and patient/employee experience. Analytic tools, machine learning and bots aid care professionals in delivering these targeted interventions.

Potential gaps include:

  • Care inconsistent with clinical guidelines
  • Patients’ nonadherence to treatment and medication plans
  • Lab reports indicating poorly controlled or managed disease
  • Inappropriately monitored treatments

Gap analysis can help determine which patients and treatments will likely incur unnecessary care and resources. These patients may benefit from enhanced engagement, direct interventions and/or enrollment in disease management programs. Not all gaps are equal. It is important to address care gaps where targeted interventions offer the highest potential for better care outcomes.

With these insights, healthcare organizations can make strategic decisions regarding the treatment of specific populations that absorb disproportionate care resources. Depending on overall care utilization and financial risk predictions, the organization might focus more directly on populations with specific chronic diseases or on individuals who are frequent emergency department visitors, for example.

Organizations can also use gap analysis analytic tools to enhance clinician performance. After identifying certain physicians drive disproportionate utilization and costs, analytical tools can recommend specific changes to drive better outcomes, processes and results.

Machine Learning
Machine learning tools can help healthcare organizations improve clinical and business processes, drive efficiencies and optimize performance. However, these tools require careful oversight since they can also reinforce inherent errors and biases.

As organizations move increasingly to automated systems for triaging and directing routine care, bots help process data through machine learning software programs. Bots can also direct members/patients to the most appropriate administrative or care delivery resources.


To effectively transition to value-based care delivery and population health management, healthcare organizations must engage with data in new and powerful ways. Success at risk-based contracting requires constant adjustments to deliver better outcomes, cost reduction and improve performance. Those capabilities can only be achieved by leveraging massive data to drive actionable knowledge and insight.

The curation process is an essential step, enabling the organization to continuously monitor and apply data in ways that inform and enhance ongoing analysis. The right technology solutions and tools can support healthcare professionals in making the decisions that improve processes, care outcomes and consumer experience. In this age of liquid data, human-machine collaboration offers a path to better care and more robust financial performance.

Let’s Talk
Share This Page