Biomedical pattern analysis

Electronic health records contain a wealth of information that has not been fully exploited. In the UK, clincal practices have been computerised since the 1990's whereas hospital episode data have been available since about 2005. The clinical informatics has gone significant advancement. It is possible to retrieve millions of patient records over time and across vendors and clinical practices, e.g., about a million of patient records in the Quality Improvement of Chronic Kidney Disease (QICKD) data set that I work on. Biomedical pattern analysis seeks to understand the underlying trend from this massive amount of data. It requires both knowledge in both pattern recognition and medicine.

This webpage summaizes some of my work on biomedical pattern analysis.
  1. Calibrating Biomedical Signal Stored in Patient Records
  2. Estimating Glomerular Filtration Rate
  3. Visualizing Patient Records
  4. Healthcare Process Modelling
  5. Reading list

Calibrating Biomedical Signal Stored in Patient Records

Problem: Estimated Glomerular Filtration Rate (eGFR) is the pillar of CKD diagnosis, staging, and management. Since 2006, general practices in the UK have been reporting eGFR in a standardised way following closely the guidelines proposed by the UK’s National Institute for Health and Clinical Excellence (NICE), that is, reporting eGFR using Modification of Diet in Renal Disease (MDRD) which is based on serum creatinine that has been aligned to isotope dilution mass spectrometry (IDMS). Although the newly introduced standardised reporting is more accurate, it also has an unintended adverse consequence: the eGFR measurements recorded before and after the standardised reporting are now incompatible with each other. Unfortunately, to date, it is not possible to systematically distinguish the different serum creatinine assay methods and eGFR reporting from health records.

Our proposal: We developed an algorithm that is capable of de-identifying the different eGFR assays. The algorithm was tested on a subset of the QICKD data set from CKD patients in England.

Findings: Our study shows that the developed algorithm can (1) identify the different eGFR reporting methods, and (2) “back-calibrate” the eGFR time-series in such a way that after calibration, the eGFR measurements stored in a patient’s health record prior to the standardised eGFR reporting become compatible with those reported using the standardised reporting.

Implications: The eGFR values of female patients calculated using MDRD alone would overestimate the one that is confirming to the NICE guidelines, whereas those of male patients would have been under estimated. Therefore, we recommend that our proposed algorithm be used to back-calibrate eGFR measurements when a novel eGFR reporting method is adopted each time.


Estimating Glomerular Filtration Rate

The objective of the study is really to make sense out of the data. One question that the QICKD investigators are interested in is to estimate the change in renal function (say g') given the current state of renal function (g) and the age of a patient (t). Estimating the change in renal function appears to be harder than we think. This is because each measurement, known as estimated Glomerular Filteration Rate (eGFR), is influenced by daily fluctuation such as our body's biological clock (circadian rhythm), as well as food intake (particularly protein) and activities performed prior to the measurment being taken. Therefore, the objective is really to estimate the long term trend amidst the fluctuation. The conventional method proceeds by calculating the rate of change given two consecutive eGFR values. I propose to fit a regression line on the eGFR sequence for each patient. Hence, each patient has a single model - the essence in personalized medicine. The rate of change of eGFR is then obtained by calculating the first derivative of the fitted function. The main advantage of this method is its robustness to instantaneous fluctuation (since we give the expected g) and the secondly, the rate change of the expected eGFR trend can be derived analytically and at any given point in time.See the figure below.
eGFR over time
The next step consists of sampling g, g' and t and then estimate p(g'|g,t). This function gives the likelihood of the rate change of eGFR given the current renal stage (that is the current eGFR value) and the age of the patient (t). We frame this as a conditional density estimation problem in which g',g, and t are continuous. The result is a likelihood graph shown below. The same graph can be represented as a likelihood table.



Visualizing Patient Records

Personalized medicine involves customising management to meet patients’ needs. In CKD at the population level there is steady decline in renal function with increasing age; and progressive CKD has been defined as marked variation from this rate of decline.

To create visualisations of individual patient’s renal function and display smoothed trend lines and confidence intervals for their renal function and other important co-variants.

Applying advanced pattern recognition techniques developed in biometrics to routinely collected primary care data collected as part of the Quality Improvement in Chronic Kidney Disease (QICKD) trial.  We plotted trend lines, using regression, and confidence intervals for individual patients.  We also created a visualisation which allowed renal function to be compared with six other co-variants: glycated haemoglobin (HbA1c), body mass index (BMI), BP, and therapy.  The outputs were reviewed by an expert panel.

We successfully extracted and displayed data.  We demonstrated that estimated glomerular filtration (eGFR) is a noisy variable, and showed that a large number of people would exceed the “progressive CKD” criteria.  We created a data display that could be readily automated.  This display was well received by our expert panel but requires extensive development before testing in a clinical setting.

It is feasible to utilise data visualisation methods developed in biometrics to look at CKD data.  The criteria for defining “progressive CKD” need revisiting, as many patients exceed them.  Further development work and testing is needed to explore whether this type of data modelling and visualisation might improve patient care.


Healthcare Process Modelling

Background: Medical research increasingly requires the linkage of data from different sources. Conducting a requirements analysis for a new application is an established part of software engineering, but rarely reported in the biomedical literature; and no generic approaches have been published as to how to link heterogeneous health data.

Methods: Literature review, followed by a consensus process to define how requirements for research, using, multiple data sources might be modeled.

Results: We have developed a requirements analysis: i-ScheDULEs - The first components of the modeling process are indexing and create a rich picture of the research study. Secondly, we developed a series of reference models of progressive complexity: Data flow diagrams (DFD) to define data requirements; unified modeling language (UML) use case diagrams to capture study specific and governance requirements; and finally, business process models, using business process modeling notation (BPMN).

Discussion: These requirements and their associated models should become part of research study protocols.


Reading list