Research topics

Research topics

I work on pattern recognition with applications to biometrics and healthcare informatics.
If you are interested in any of the topics discussed below, please get in touch with me.


My research on biometrics has been focussing on improving various aspects of a biometric system, which can be regarded as a pattern recognition problem. It consists of the following modules/tasks: a pre-processing module, feature representation, classifier design, and information fusion. Some of these aspects are shown in the figure below.

 A summary of research topics on biometrics
  1. Classifier fusion: Combining information from several biometric systems of the same or different modalities can often improve the system performance. Two factors can affect the system performance: correlation and the strength of each classifier. Correlation can be significantly higher in intramodal fusion (combining two classifiers processing the same modality) than in multimodal fusion. So, my prior work consists of advancing the understanding of fusion, taking both aspects into account. This investigation leads to the development of a classifier fusion predictive model giving Equal Error Rate (EER) as output. It is a commonly used metric for biometric authentication because it accounts for the imbalanced nature of genuine and impostor matching.
  2. Biometric sample quality: Biometric samples collected using a different sensoring device or under uncontrolled or adverse environments can compromise the recognition performance of a biometric system. I have developed methods to exploit the signal  quality (in the form of "quality measures") to reduce the impact of performance degradation. These methods attempt to adapt the model and/or to calibrate the matching scores.
  3. Client or user adaptation: Being a human-centric application, researchers woking on biometrics have observed the impact of users on the biometric performance. By changing the underlying demographics of the database but keeping the database size (the number of subject) the same, one often obtains a slightly different performance every time. I attempted to understand this phenomenon in two research directions: 1) developing metrics to assess the impact of demographics on the performance; and 2) developing methods to reduce the impact. My second  investigation leads me to develop algorithms that adapt to a specific user and later to a group of users sharing some similar characteristics. A client-adaptive system can autoamtically adjusts the appropriate decision threshold based on some training data. Interestingly, directing adjusting for the threshold is more difficult than trying to find a projection function that calibrates the matching score. According to hundreds of experiemnts that I have conducted, up to a reduction of 50% of EER can be observed.
  4. Databases and benchmarking: In pursuit of research excellence -- ensuring unbiased experimental reporting and repeabaility of experiments -- I have produced and published a number of databases and contributed to benchmarking efforts. Check out and download  XM2VTS, BANCA, and Biosecure databases. I have also organised two competitions on biometrics: a benchmark of fusion algorithms in Biosecure workshop 2007 and a face video competition in conjunction with ICB2009.
My future research in this area includes:
  1. Spoof-robust biometrics: There have been numerous reports about researchers and hackers who were successful at fouling a biometric system by introducing fake biometric sample made from common materials (e.g., using "gummy" fingeprints). There is a urgent need in addressing this security issue. My research will focus on devising algorithms capable of combining liveness detection information into an existing biometric system. 
  2. Self-calibrating biometrics: Biometric systems change over time and its performance is highly dependent on the acquisition conditions. A biometric system that is self-calibrating will be able to adjust its parameters so that it will always perform optimally under all possible operating conditions. My research will focus on tackling this challenge by combining various aspects of adaptation that I have developed into a single framework; these include, quality-adaptation, temporal adaptation, cross-device adaptation, and user-adaptation.
  3. Exploitation of cohort information: There has been a number of algorithms that rely on a "background" database, or a database of cohort users. For instance, the state-of-the-art approaches to face recognition (e.g., sparse representation and one-shot model) and speaker recognition (GMM)  rely on a set of background or cohort users, What is the impact of cohort users when designing a biometric system? How to choose an optimal set of cohort users? Is there an optimal way of incorporating cohort users? These are some of the questions left unaswered in the literature.
The research above was made possible thanks to the following funding agencies:
  • FP6 and FP7 EU projects: Biosecure, MOBIO, and BEAT
  • Swiss NSF foundation
  • Universiti Sains Malaysia for its RLKA fellowship
and collaborators:

Healthcare informatics

There are a number of potential research topics in healthcare informatics. These research topics are shown in the figure below.
research topics in healthcare informatics
  1. Patient ID encryption: Health records of the same patient are often required to be joint together to form a larger record. They can come from hospital,primary care clinics (general practices), and disease-specific registries. In order to work with patient data without knowing their identity so as to protect their privacy, we encrypt any data that can reveal their identity, such as, name, address, and any references that that infer their identity (i.e., usually NHS number in the UK). We need to ensure that two encrypted data ("keys") should match each other if they come from the same patient. The challenge here is to produce a unique key even when the identity-related data may have some variation. For instance, even if a patient's name may be spelled slightly differently, we require that the algorithm is still able to identify the resultant keys with high level of probability.
  2. Ontology: Significant advancement has been made on medical ontology. As a result, different health record systems can communicate with each other thanks to a common set of clinical terms. Five-byte read codes and SNOMED-CT are two prominent examples. Within SNOMED-CT it is now possible to establish relationship among concepts using "is-a" and other entity relationship specifiers. However, this is a tedious and knowledge-driven process. I would like to investigate data-driven methods that are able to improve upon the existing knowledge-driven one.
  3. Database management and data representation: In order to handle and manage millions of patients of records (which is the case for the Quality Improvement Chronic Kidney Disease -- the QICKD study), we need to encode data in three dimensions: the concept dimensions (in the order of 100K), the patients (millions), and the temporal dimension (up to 20 years of data). Some possible research questions are: 1) How to represent the data in an efficient way? 2) How to present them to clinicians?
  4. Knowledge discovery and machine learning: Health records are extremely sparse in features and in time. Although there are algorithms capable of dealing with time, such as HMM, the signals are often regularly sampled (e.g., you get 8000 samples every second). In health records, data are not sampled regularly as patients go to their clinicians as and when required or necessary. Conventional temporal models such as HMM is not suitable for this problem; some modications are required. Furthermore, the method has to be augmented to deal with hundreds of thousands of features that are sparse (where only a few features have data for a given patient). These are some examples of technical challenges to be solved in this research direction..
  5. Human Computer Intraction (HCI): We need to present meaningful data to clinicians and patients. Both groups of users have very different requirements and purposes. How to reduce information overload when presenting data? This is an art of science in data engineering. Then, we also need to provide a way for clinicians to manipulate the data when carrying a typical seven-minute consultation.
Click here to view my on-going work on this area.

Below are some research topics in healthcare as well as data sets and possible solutions that I have identified. If you spot any topics that you might be interested, please get in touch with me.

Research topics in healthcare
CKD stands for Chronic Kidney Disease; HES: Hospital Episode Statistics -- health data obtained from hospitals in the UK