Research

Current Projects

Eye2Gene: Eye2Gene is my main research focus, which is to help diagnose inherited retinal diseases using artificial intelligence.

Moorfields Health Informatics: Working closely with clinicians to integrate imaging, text and genetic data at Moorfields to answer research questions and improve clinical practice.

Phenopolis: https://phenopolis.github.io

UCLex: Development of the UCL exome sequencing and analysis pipeline with Dr Vincent Plagnol. In particular working on the integration of phenotypes using Human Phenotyp Ontology terms.

Genetics of eye disorders: Bioinformatician for the Institute of Ophthalmology working with Professor Alison Hardcastle and her group. I am particularly involved with the UK Inherited Retinal Disease Consortium which focuses on unsolved cases in well studied pedigrees.

Genetics of Crohn's disease: Identification of genetic variants in large Ashkenazi Jewish pedigrees which could be linked to Crohn's disease (Levine A, Pontikos N et al 2016). This is work undertaken with Professor Tony Segal in Internal Medicine and his group, notably Dr Adam Levine and Dr Elena Schiff.

Previous Projects

As a MRC funded PhD student (2011-2014) under the supervision of Dr Chris Wallace from the DIL stats group, I worked on analysing flow cytometry data for immune-phenotype to genotype association studies in the context of Type 1 Diabetes. You can find my thesis here.

Development and application of clustering methods: I was and am still interested in normalisation and clustering. Clustering, is possibly the most widespread forms of unsupervised learning, and has many applications in all areas of science. In biology, clusters could be populations of cells (e.g flow cytometry data), or different genotypes (e.g SNP array) or copy number groups (e.g quantitative PCR data). However due to high levels of uncertainty in biological datasets because of noisy data and the absence of prior knowledge, more probabilistic approaches become useful. This was achieved by assigning probabilistic weights of cluster membership elements by using model-based clustering such as fitting a mixture of distributions. These probabilistic weights were then used in downsteam analysis, by using methods such as multiple imputation (where practical), in order to test the genetic association with disease or other phenotypes of interest such as protein expression or cell frequency.

Finding dose-responsive cell in flow cytometry: I worked on the datasets described in this paper and this paper. My idea was to use a nearest-neighbour approach (RANN package) in R to merge datasets across flow cytometry dose-response experiments. I was then able to obtain a per cell dose-response curve. Using this approach I identified a group of cells which were ignored by manual gating as having a high dose-response. However it later surfaced that these cells might have simply been artefact as they were not seen consistently across experiments. This work is described in more detail in my chapter 3 of my thesis.

Association of CD25 expression with IL2RA genotype: I started off my PhD by reanalysing, using computational methods (k-means, mixtures of univariate distributions), previous published data to test the correlation between CD25 expression on naive and memory T cells with regulating SNPs in the IL2RA gene. I found that automatic methods were able to outdo manual methods by improving the reproducibility of the results and the strength of the correlation but did poorly when the data did not fit the prior assumptions (e.g. number of clusters larger than expected). But this seeming weakness of the automatic method can be useful as it makes for an excellent outlier detection tool, capabale of spotting cases were the data does not fit the expected model. I summarised and presented some of my findings as a poster (18-06-2012) at the Cyto 2012 Conference in Leipzig.

Association of KIR3DL1/3DS1 with type 1 diabetes: I devised a technique for imputing the copy number of two KIR genes from SNP signals. Killer Immunoglobulin-like Receptors (KIR) play an important role in the innate immune system. Present on the surface of Natural Killer cells, KIRs mediate the fate of target cells based on the composite inhibiting/activating signal generated by binding to their HLA Class I ligands. Type 1 Diabetes (T1D) is an autoimmune disease known to be strongly associated with the HLA region, primarily with HLA Class II genes, but also with HLA Class I loci, such as HLA-Bw4/Bw6 epitope (P=6.57E-6 conditional on HLA Class II). Of the 17 known KIR genes, KIR3DL1 is the only one known to interact biologically with HLA-Bw4, which makes it a suitable candidate gene for T1D. Furthermore, its activating counterpart, KIR3DS1, may also putatively interact with HLA-Bw4. We conducted a case-control study (816 cases: 813 controls) to assess whether there is evidence that copy number variation in KIR3DL1/3DS1 is associated with T1D. Presented poster (07-09-2013) at the Genomics of Common Disease 2013 in Oxford. Published in BMC Genomics.