Biomedical research

Biomedical pattern analysis

Electronic health records contain a wealth of information that has not been fully exploited. In the UK, clincal practices have been computerised since the 1990's whereas hospital episode data have been available since about 2005. The clinical informatics has gone significant advancement. It is possible to retrieve millions of patient records over time and across vendors and clinical practices, e.g., about a million of patient records in the Quality Improvement of Chronic Kidney Disease (QICKD) data set that I work on. Biomedical pattern analysis seeks to understand the underlying trend from this massive amount of data. It requires both knowledge in both pattern recognition and medicine.

This webpage summaizes some of my work on biomedical pattern analysis.
  1. Biomedical pattern analysis
  2. Calibrating Biomedical Data
  3. Estimating Glomerular Filtration Rate
  4. Visualizing Patient Records
  5. Healthcare Process Modelling
  6. Agile Exploration of Computerised Medical Records
  7. Reading list

Calibrating Biomedical Data

Problem: Estimated Glomerular Filtration Rate (eGFR) is the pillar of CKD diagnosis, staging, and management. Since 2006, general practices in the UK have been reporting eGFR in a standardised way following closely the guidelines proposed by the UK’s National Institute for Health and Clinical Excellence (NICE), that is, reporting eGFR using Modification of Diet in Renal Disease (MDRD) which is based on serum creatinine that has been aligned to isotope dilution mass spectrometry (IDMS). Although the newly introduced standardised reporting is more accurate, it also has an unintended adverse consequence: the eGFR measurements recorded before and after the standardised reporting are now incompatible with each other. Unfortunately, to date, it is not possible to systematically distinguish the different serum creatinine assay methods and eGFR reporting from health records.

Our proposal: We developed an algorithm that is capable of de-identifying the different eGFR assays. The algorithm was tested on a subset of the QICKD data set from CKD patients in England.

Findings: Our study shows that the developed algorithm can (1) identify the different eGFR reporting methods, and (2) “back-calibrate” the eGFR time-series in such a way that after calibration, the eGFR measurements stored in a patient’s health record prior to the standardised eGFR reporting become compatible with those reported using the standardised reporting.

Implications: The eGFR values of female patients calculated using MDRD alone would overestimate the one that is confirming to the NICE guidelines, whereas those of male patients would have been under estimated. Therefore, we recommend that our proposed algorithm be used to back-calibrate eGFR measurements when a novel eGFR reporting method is adopted each time.


  • N. Poh and S. de Lusignan, Calibrating Longitudinal eGFR in Patience Records Stored in Clinical Practices Using a Mixture of Linear Regressions, Workshop on Pattern Recognition for Healthcare Analytics, ICPR 2012, Tsukuba Science City, Japan, 2012. [pdf]
  • Norman Poh, Andrew McGovern, and Simon de Lusignan, Towards automated identification of changes in laboratory measurement of renal function: implications for longitudinal research and observing trends in glomerular filtration rate (GFR), Dept of Computing Technical Report, TR-14-03, 2014. [pdf]

Estimating Glomerular Filtration Rate

The objective of the study is really to make sense out of the data. One question that the QICKD investigators are interested in is to estimate the change in renal function (say g') given the current state of renal function (g) and the age of a patient (t). Estimating the change in renal function appears to be harder than we think. This is because each measurement, known as estimated Glomerular Filteration Rate (eGFR), is influenced by daily fluctuation such as our body's biological clock (circadian rhythm), as well as food intake (particularly protein) and activities performed prior to the measurment being taken. Therefore, the objective is really to estimate the long term trend amidst the fluctuation. The conventional method proceeds by calculating the rate of change given two consecutive eGFR values. I propose to fit a regression line on the eGFR sequence for each patient. Hence, each patient has a single model - the essence in personalized medicine. The rate of change of eGFR is then obtained by calculating the first derivative of the fitted function. The main advantage of this method is its robustness to instantaneous fluctuation (since we give the expected g) and the secondly, the rate change of the expected eGFR trend can be derived analytically and at any given point in time.See the figure below.
eGFR over time
The next step consists of sampling g, g' and t and then estimate p(g'|g,t). This function gives the likelihood of the rate change of eGFR given the current renal stage (that is the current eGFR value) and the age of the patient (t). We frame this as a conditional density estimation problem in which g',g, and t are continuous. The result is a likelihood graph shown below. The same graph can be represented as a likelihood table.


  • Poh, N. and S. de Lusignan (2011). Modeling Rate of Change in Renal Function for Individual Patients: A Longitudinal Model Based on Routinely Collected Data. Neural Information Processing Systems (NIPS) Personalized Medicine Workshop 2011 (NIPS PM 2011), Sierra Nevada. [pdf] [Download the likelihood table] [spotlight presentation]

Visualizing Patient Records

Personalized medicine involves customising management to meet patients’ needs. In CKD at the population level there is steady decline in renal function with increasing age; and progressive CKD has been defined as marked variation from this rate of decline.

To create visualisations of individual patient’s renal function and display smoothed trend lines and confidence intervals for their renal function and other important co-variants.

Applying advanced pattern recognition techniques developed in biometrics to routinely collected primary care data collected as part of the Quality Improvement in Chronic Kidney Disease (QICKD) trial.  We plotted trend lines, using regression, and confidence intervals for individual patients.  We also created a visualisation which allowed renal function to be compared with six other co-variants: glycated haemoglobin (HbA1c), body mass index (BMI), BP, and therapy.  The outputs were reviewed by an expert panel.

We successfully extracted and displayed data.  We demonstrated that estimated glomerular filtration (eGFR) is a noisy variable, and showed that a large number of people would exceed the “progressive CKD” criteria.  We created a data display that could be readily automated.  This display was well received by our expert panel but requires extensive development before testing in a clinical setting.

It is feasible to utilise data visualisation methods developed in biometrics to look at CKD data.  The criteria for defining “progressive CKD” need revisiting, as many patients exceed them.  Further development work and testing is needed to explore whether this type of data modelling and visualisation might improve patient care.


  • Poh, N. and S. de Lusignan (2012). "Data-modelling and visualisation in chronic kidney disease (CKD): a step towards personalized medicine." Informatics in Primary Care 19(2) [pdf]

Healthcare Process Modelling

Background: Medical research increasingly requires the linkage of data from different sources. Conducting a requirements analysis for a new application is an established part of software engineering, but rarely reported in the biomedical literature; and no generic approaches have been published as to how to link heterogeneous health data.

Methods: Literature review, followed by a consensus process to define how requirements for research, using, multiple data sources might be modeled.

Results: We have developed a requirements analysis: i-ScheDULEs - The first components of the modeling process are indexing and create a rich picture of the research study. Secondly, we developed a series of reference models of progressive complexity: Data flow diagrams (DFD) to define data requirements; unified modeling language (UML) use case diagrams to capture study specific and governance requirements; and finally, business process models, using business process modeling notation (BPMN).

Discussion: These requirements and their associated models should become part of research study protocols.


  • Simon de Lusignan, Josephine Cashman, Norman Poh, Georgios Michalakidis, Aaron Mason, Terry Desombre, Paul Kraus, Conducting Requirements Analyses for Research using Routinely Collected Health Data: a Model Driven Approach, Studies in Health Technology and Informatics, Volume 180, pg 1105-1107, 2012 [link]

Agile Exploration of Computerised Medical Records

Abstract: When the EHRs are used for secondary purposes such as service evaluation and epidemiology research, data are increasingly aggregated from EHRs from different clinics and hospitals; over time, and from different EHR vendors.  The sheer size of data means that they are increasingly difficult to manage, and our experiential learning in diabetes and chronic kidney disease (CKD) suggests that simplistic processing can lead to errors.  In this paper we propose an agile data management process avoiding the need to import and process data in a relational database; and this reduces combined processing and analysis time.  We carried out a demonstration study to identify how blood pressure varied between those recorded for patients included or excluded from quality targets.  We describe a novel specification language that allows clinicians to focus on identifying the variables they need to extract useful information from EHRs.  Data to answer a research question were available in <1hour rather than the much longer times previously required in extracting, assembling and processing data from our SQL database.


  • Norman Poh, Simon de Lusignan, Harshana Liyanage, Jeremy van Vlymen, Paul Krause, and Simon Jones, accepted for publication in MEDINFO 2013 [pdf]

Secure transmission of patient data

We have produced technical reports:
  • H. Abdulrahman, N. Poh and J. Burnett, Privacy Preservation, Sharing and Collection of Patient Records using Cryptographic Techniques for Cross-Clinical Secondary Analytics, Dept of Computing Technical Report, TR-14-01, 2014. [pdf]
  • N. Poh and A. Katibi, Addressing Privacy Concerns in Secondary Use of Centralized Clinical Medical Records through Data Protection, System Architecture Design, and Vulnerability Assessment, Dept of Computing Technical Report, TR-14-02, 2014. [pdf]

Reading list