Bioinformatics support at SciLifeLab - past, present and future
1Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
Science for Life Laboratory (SciLifeLab) was established during early 2010 with two nodes in Stockholm and Uppsala respectively. The venture was enabled by a strategic grant from the Swedish government and organised as a collaboration between the universities in the respective cities. With regards to bioinformatics support and competence, SciLifeLab has enabled that various initiatives have been co-organised within the same framework.
When SciLifeLab started the immediate bioinformatics focus was on delivering on the expectations built up by “next-generation sequencing”, which was both great and somewhat unrealistic at times. Today, six years later, the expectations are even greater, but the infrastructure for bioinformatics support has matured into an organisation that can deal not only with data management, but also with advanced downstream bioinformatics support.
I will walk you through our struggles, highlight with some success stories, describe our current status och make a personal forecast about the future.
Next generation sequencing in clinical diagnosis of familial hypercholesterolemia
1University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, Slovenia, 2Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia, 3Institute of Oncology, Ljubljana, Slovenia
Introduction: Familial hypercholesterolemia (FH) is increasing the risk for developing atherosclerosis and cardiovascular disease in early adulthood up to 100-fold, while the risk can be reduced by early diagnosis and disease management. Elevated total cholesterol level, family history of premature cardiovascular complications, presence of xanthomas and corneal arcus and/or causative variants in genes implicated in FH are representing the criteria for the clinical recognition of the disease. Majority of the patients are heterozygous carriers of disease-causing variants in the gene encoding the LDL receptor (LDLR). Minority have disease-causing variants in genes encoding apolipoprotein B (APOB) or proprotein convertase subtilisin/kexin type 9 (PCSK9). Various screening strategies are proposed to identify children with FH. Slovenia is currently the only country with implemented universal screening for hypercholesterolemia in 5-year-old children enabling identification of patients without known family history (1). Screening began in 1995 and was gradually implemented through the whole country.
Subjects and methods: We aimed to identify individuals with FH from the cohort of children with elevated total cholesterol levels detected in the universal national screening. Children with total cholesterol level of more than 6 mmol/L or more than 5 mmol/L with a positive family history for premature cardiovascular disease were genotyped with next generation sequencing (NGS) for variants in 4 genes associated with FH (LDLR, PCSK9, APOE, APOB) with ADH MASTR v1 (Multiplicon) on the MiSeq (Illumina) platform. The variants were validated using Sanger sequencing.
Results: 38.6% of the patients had disease-causing variants in LDLR, 18.4% in APOB and none in PCSK9 (2). The simulated detection rate of FH in Slovenian universal screening based on an assumed 1 in 500 incidence rate was more than 96%, where analytic sensitivity, specificity and accuracy were 100%, while with the 95% confidence level probability of the false negative was 5% and sensitivity 95%.
Conclusions: Universal national screening for hypercholesterolemia at the age of 5 years genetically confirmed FH in more than a half of referred subjects, and was thus predicted to detect almost all assumed patients in the population. This is proving the NGS based strategy an effective tool in early recognition of FH.
1. Sedej K et al. Decreased prevalence of hypercholesterolaemia and stabilisation of obesity trends in 5-year-old children: possible effects of changed public health policies. Eur J Endocrinol 2014;170:293-300.
2. Klančar G et al. Universal screening for familial hypercholesterolemia in children. J Am Coll Cardiol, 2015;66(11):1250-7.
Slovenian genome variability project
Aleš Maver1, Borut Peterlin1, Alenka Hodžić1
1University Medical Centre Ljubljana, Ljubljana, Slovenia
Knowledge of natural and morbid genetic variability in human populations is a key foundation for understanding the relationship between genotype and phenotype, especially as we progress into the genome-wide sequencing era. A majority of information on genetic variability in human populations is either gathered on individuals from larger populations of developed western countries or very specific populations of interest. One the other hand, systematic characterisation of genetic variability for several smaller and fragmented populations, including Slovenian, is currently lacking or virtually inexistent. In effect, genetic diagnostics and research of human diseases are lagging due to this paucity of information.
We have been systematically collecting and organising the population and disease-associated genetic variation obtained using exome and genome sequencing data within the Centre for Mendelian Genomics. Since the time of establishment, we have assembled a database of regional genetic variation that includes information on over 10 million variants in populations of Slovenia and neighbouring regions. We have also established the morbid database with the information on over 600 disease-associated variants we identified and reported to date. In addition to small genetic variation in the nuclear genome, we have also systematically analysed and collected the information on mitochondrial variation, copy-number variation and more complex structural variants.
In our presentation, we will illustrate the significance of capturing the regional genetic variability for improved diagnostics and research of human diseases as defined in the Slovenian genome variability project. We will show that progressively richer population resource results in significant improvement and facilitation of genomic data interpretation. Ultimately, we will present the key role of such a population resource for several aspects of personalised medicine, including presymptomatic identification of medically actionable genetic variants based on population data.
Genetic variability of microRNA regulome and its potential for biomarker discovery
1Department of Animal Science. Biotechnical Faculty, University of Ljubljana, Slovenia
MicroRNAs (miRNAs) are a class of short non-coding RNAs involved in the regulation of gene expression and it has been estimated that they fine-tune the expression of 30% of protein-coding genes. On average each miRNA is predicted to regulate approximately 200 targets. MicroRNAs are part of the complex regulatory network and are associated with several epigenetics concepts. For example, miRNA silencing is one of the classes of epigenetics mechanisms, additionally, miRNA genes themselves could also be epigenetically regulated like any other protein-coding gene . MicroRNAs have been shown to be involved in numerous physiological processes as well as disease development. They have been shown to have potential for diagnostic and prognostic biomarkers as well as treatment targets. However, prioritization of a miRNA candidate for functional studies still presents a challenge because understanding of complex miRNA related interactions is not yet complete. Additionally, the field lacks central miRNA genomics repository and the data are fragmented through various databases and publications. Additionally, several bioinformatics tools are also missing and many of the existing tools are not regularly updated due to constant updates of the source databases. There are several possible directions for miRNA based biomarker prioritization for functional studies. One of the possible strategies is integrated analysis of heterogeneous gene expression profiles for development of robust disease-specific transcriptional fingerprints. Next, potential biomarkers could be located within miRNA regulatory regions (miR-rSNPs), for example within binding sites for transcription factors, within mature miRNA regions (miR-SNPs) and within miRNA target sites (miR-TS-SNPs). Potential biomarkers also include polymorphisms associated with miRNA silencing machinery (miR-SM-SNPs), which are either located within genes encoding for components of miRNA biogenesis (Drosha, Dicer) or within miRNA genes overlapping Drosha/Dicer cleavage sites . Epigenetic silencing of some miRNAs is cancer specific; therefore this mechanism could also be used for biomarker development . One of the strategies is to first develop an integrated atlas of miRNA gene regulatory elements, consisting of known upstream regulators, downstream targets and overlapping genomics elements, followed by selection of potential biomarkers. Understanding of complex interplay between miRNAs, other classes of non-coding RNAs and protein-coding genes is not yet complete, but is of importance for development of novel biomarkers and for the design of novel therapeutic strategies.
RNA-binding proteins in neurodegeneration
1Jozef Stefan Institute, Ljubljana, Slovenia, 2Biomedical Research Institute BRIS, Ljubljana, Slovenia, 3Faculty of Chemistry and Chemical Technology, University of Ljubljana, Slovenia
Amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD) are devastating neurodegenerative diseases that form two ends of a complex disease spectrum. Cytoplasmic aggregation of otherwise nuclear RNA binding proteins, such as TDP-43 or FUS, is one of the hallmark pathological features of ALS and FTDL, and suggests perturbance of the RNA metabolism and/or nuclear transport in their aetiology. In 95% of all ALS and 60% of FTLD patients the aggregating protein is TDP-43, thus defining the major part of the disease spectrum as TDP-43 proteinopathies. However, only a very small percent of aggregation is caused by TDP-43 mutations. Therefore, the main questions in the field are what makes wildtype TDP-43 or FUS mislocalize and aggregate in ALS and FTLD and what is the role of perturbed RNA metabolism in these diseases. Recent identification of the disease-associated expansions of the intronic hexanucleotide repeat GGGGCC (G4C2) in the C9orf72 gene further substantiates the case for RNA involvement. This hexanucleotide repeat expansion mutation (HREM) has turned out to be the single most common genetic cause of ALS and FTLD and also presents itself as TDP-43 proteinopathy. HREM may enable the formation of complex DNA and RNA structures, changes in RNA transcription and processing and formation of toxic RNA foci, which may sequester and inactivate RNA binding proteins. This complexity is furtherer increased by the fact that expanded repeat is also transcribed in the antisense direction forming the CCCCGG (C4G2) repeat RNA. Additionally, the transcribed expanded repeats from both directions can undergo repeat-associated non-ATG-initiated (RAN) translation resulting in accumulation and aggregation of a series of dipeptide repeat proteins.
Newborn screening for inherited metabolic disorders
1University Children's Hospital, Ljubljana, Slovenia, 2Faculty of Medicine, Ljubljana, Slovenia
Screening methods are very important in preventive medicine; their goal being the detection of the disease before the development of clinical signs. Tandem mass spectrometry plays a mayor role in newborn screening of inborn errors of metabolism, as it allows screening for more than one disease simultaneously, short time of analysis and high sensitivity. In one test tandem mass spectrometry allows quantification of many amino acids and acylcarnitines from dried blood spots, which alllows detection of disorders in amino acid metabolism, organic acidurias and fatty acid oxidation disorders.
Slovenia currently screens newborns only for phenylketonuria and congenital hypothyroidism. Last year a pilot study of expanded newborn screening for inborn errors of metabolism using tandem mass spectrometry started. It will contribute to the development of optimal strategy of newborn screening for inherited errors of metabolism in Slovenia and determination analyte cut-off values. 10000 dried blood spots from newborns were analysed retrospectively for the following disorders, which are included in most screening programmes worldwide: MCAD (medium-chain acyl-CoA dehydrogenase deficiency), GA 1 (glutaric aciduria type 1), GA 2 (glutaric aciduria type 2), 3-MCC (3-methylcrotonyl-CoA carboxylase deficiency), MSUD (maple syrup urine disease), VLCAD (very long-chain acyl-CoA dehydrogenase deficiency), LCHAD, IVA, PA/MMA (isovaleric aciduria / methylmalonic aciduria), CUD (carnitine uptake deficienc), CPT 1, CPT 2. We also included phenylketonuria, so we could compare our results with the results of the current method for phenylketonuria screening (fluorimetric detection of phenylalanine).
The study is still ongoing; final confirmation tests are in progress. 5 cases of inborn errors of metabolism were identified so far. First case was a VLCAD deficiency, which is a fatty acid oxidation disorder (the case was confirmed with sequencing, enzymatic activity analysis and palmitate loading test). 4 cases of organic acidurias were found; three cases of 3-MCC deficiencies (confirmed with organic acid analysis in urine) and one GA 1 (confirmed with enzymatic activity analysis). In our study we also detected two already known patients with phenylketonuria.
Based on the preliminary results from the pilot study the cumulative incidence of inborn errors of metabolism (7 cases in 10000 newborns) is high in Slovenia. We are currently doing follow-up tests on selected newborns with the highest disease possibility to set the cut-off values for the chosen disorders.
Thiopurine S-methyltransferase (TPMT) pharmacogomics and beyond
1University of Ljubljana,Faculty of Pharmacy
The enzyme thiopurine S-methyltransferase (TPMT) plays a major role in the deactivation of thiopurines and is to a large extent responsible for inter-individual differences in response to treatment. Although polymorphisms in the TPMT gene are the major cause for reduced enzyme activity, the genotype and enzyme activity is incomplete.
The challenge in this field is thus to identify novel biochemical and genetic factors, which either by influencing TPMT activity or, independently of TPMT, influence the efficacy and safety of acute limfoblastic leukemia (ALL) treatment with thiopurines.
We have shown that mutated MTHFR gene augmented the effect of mutated TPMT gene, to 6-MP related toxicities in childhood ALL patients.
We have reported on the impact of the cofactor and methyl donor S-adenosyl methionine (SAM) on TPMT activity and on the cytotoxic effects of 6-MP. SAM may modulate TPMT activity in the intercellular setting, possibly by post-translational stabilization. Consequently, the endogenous availability of SAM may influence TPMT activity, the formation of 6-MP metabolites, and the toxicity of thiopurine drugs.
Further we showed protective role of inosine triphosphate pyrophosphatase (ITPA) polymorphisms in relation to event-free survival and relapse rate in pediatric ALL.
The influence of PACSIN2 (rs2413739) on appearance of side effects in pediatric acute leukemic patients was demonstrated.
The genome-wide association study (GWAS) approach has also been used to comprehensively investigate the relationship between constitutional genotype and TPMT activity.
Evaluating the kinetic parameters for quantitative models in systems biology
1Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
Modelling of biological systems has become an indispensable tool in designing of novel and analysis of existing biological systems. It can significantly reduce the time and cost of experimental work and is an integral part of interdisciplinary fields such as computational biology, bioinformatics, systems and synthetic biology, and systems medicine, each with their own goals and each tackling different aspects of understanding life as it is [1,2]. The choice of modelling technique depends on the complexity of the system we are observing, desired accuracy of simulation results or the topology, and the availability of data, which are required for model construction. Existing quantitative methods are mostly based on the numerical simulations of the system of ordinary differential equations or chemical master equation. In order to quantitatively model processes using these approaches we require accurate kinetic data, which govern the system’s dynamics. However, these are often unavailable and are most of the time hard or even impossible to measure experimentally.
We present the possibility of using expert knowledge to construct a model which produces quantitatively relevant results, even when some kinetic data is missing or is only vaguely defined. This approach is based on fuzzy logic, which uses linguistics to describe biological processes. In addition, deep mathematical knowledge is not needed to construct a model using fuzzy logic. Using this approach, we constructed a model of the circadian rhythm of Neurospora . Simulation results show that in most cases using linguistic approach retains the quantitative aspect of the conventional model.
Processes in a model based on the proposed approach are described linguistically and an expert from any field of life science with minimal mathematical background could use an existing fuzzy logic framework to define the model of a biological system he desires to simulate and still obtain quantitatively relevant simulation results.
 U. Alon, "An Introduction to Systems Biology – Design Principles of Biological Circuits.", Chapman & Hall/CRC (2007), vol. 10, no. 10.
 R. Kitney and P. Freemont, "Synthetic biology – The state of play.", FEBS Letters (jul 2012), vol. 586, no. 15, pp. 2029-2036.
 J.-C. Leloup, D. Gonze, and A. Goldbeter, "Limit Cycle Models for Circadian Rhythms Based on Transcriptional Regulation in Drosophila and Neurospora.", Journal of Biological Rhythms (dec 1999), vol. 14, no. 6, pp. 433-448.
Actinoporin-like proteins through powerful bioinformatics optics
1National Institute od Chemistry, Ljubljana, Slovenia
Beside the need to understand the functioning of human being in details, it is also necessary to study (fungal) pathogens if we want to treat emerging diseases successfully. When mining fungal genomes several challenges limit the reliability of data sets comparisons (Meyer et al 2016). Superfamily of actinoporin-like proteins (ALP) comprises diverse protein families sharing structural similarity (a rigid b-sandwich flanked by two a-helices) but low sequence similarity. Aegerolysins have been described to exhibit pleiotropic functions; some of them are hemolytic in the presence of another MACPF-domain containing protein or they are expressed during formation of fungal primordia and fruiting bodies. The mushroom lectin XCL after sugar binding induces changes in the target cytoskeleton and has insecticidal activity. Infiltration of oomycetal NPP1 into plant leaves results in accumulation of PR genes, production of ROS, ethylene, callose apposition and cell death.
We performed genome mining in fungal kingdom and comparison of lifestyles, analysis of loci, promoters, transcription, secretion and protein signatures. Genome datasets analyzed are hosted at different resources, offering benefits and limitations: NCBI - limited fungal hits, several Blast options; Mycocosm – lots of fungi, limited tools; AspGD – Aspergilli, lots of tools & data; E-Fungi - limited in fungi, useful search tools. Other applied tools were: PFAM - search by domain, limited in fungal proteins; SecretomeP - prediction of secretion, mammals based; for PHYRE2 - protein structure prediction, and CLUSTALW - multiple sequence alignment, identity threshold is empirically set. Distribution of fungal fruit body lectins, necrosis inducing proteins and aegerolysins was heterogeneous; FB lectins were rare, NPP1 and aegerolysins overrepresented. Without at least one member of the protein family per species, we consider ALP as noncore proteins. Some of aegerolysins co-distribute with MACPF. No correlation to taxonomy or pathogenic lifestyle was observed. We ascribe a part of ALP as small secreted proteins, also without recognizable signal peptide.
The quality of early genome sequences is limited due to the sequencing technology but data were often manually curated. Recent genomes suffer from propagation of errors due to automated annotation. The accuracy of gene calling is complex due to numbers of introns. If a genome is included in more resources, it is not always the same version of sequence or annotation. A single genome sequence is available for most species, usually from a lab pet. Variation in experimental conditions between datasets makes comparability difficult. Omics data are deposited in raw formats making them usable for researchers with bioinformatics skills. When collecting proteins from more sources it is difficult to define a pull of screened species. Fungal lifestyle is not univocally described.
ARRS P1-0391, J1-7515, J4-7162, EUROFUNG
Small RNAs regulatory networks - linking developmental and immune signaling in potato
Maja Križnik1, Marko Petek1, David Dobnik1, Špela Baebler1, Stephan Pollmann2, Jan Kreuze3, Jana Žel1, Kristina Gruden1
1Department of Biotechnology and Systems Biology, National Institute of Biology, 1000 Ljubljana, Slovenia, 2Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, 3International Potato Center, Peru
Plants respond to pathogen infections by activating variety of defence mechanisms, which can be detected as a broad spectrum of physiological and histological changes. One of such changes is the induction of changes in small RNA (sRNA) level. The major classes of plant sRNA types include microRNAs (miRNAs) and small interfering RNAs (siRNAs); one specific type of the latter are secondary phased siRNAs . sRNAs have emerged as key post-transcriptional regulators of genes. They can directly regulate gene expression by targeting mRNAs through complementary base pairing. In plants, sRNAs are involved in various biological processes, including growth and development, hormone signalling and defense responses against several pathogens . The growing body of evidence suggests that sRNAs are important in plant defense immunity, albeit none of the studies performed so far investigated sRNA regulatory level in potato. Therefore, we aimed at unraveling the role of sRNAs in a complex immune signaling network controlling defense responses that render the plant tolerant to the viral infection. We have identified and quantified miRNAs as well as phasiRNAs and viral sRNAs in the PVYNTN tolerant cv. Désirée and its susceptible transgenic counterpart impaired in accumulation of salicylic acid (NahG-Désirée). We have identified more than 93 differentially expressed miRNAs/phasiRNAs at the onset of viral replication in cv. Désirée. The miRNA response was however strongly attenuated in salicylic acid deficient plants suggesting that salicylic acid plays an important role in enhancing the miRNA regulatory network response to PVYNTN infection. Next, a miRNA regulatory network was constructed using in silico prediction as well as degradome sequencing and gene expression data to link our response on sRNA level to the physiological response. As already described for some other pathosystems, regulation of immune receptor transcripts is under control of this network. In cv. Désirée however the NBS-LRR targeting miRNAs were not down-regulated, but up-regulated, resembling the regulation of these genes in symbiotic interactions. We have additionally discovered an interesting novel connection between sRNAs and gibberellin biosynthesis. Increased levels of miRNA167 and several phasiRNAs were reflected in decreased levels of the target transcripts involved in gibberellin biosynthesis in the genotype Désirée only. We have functionally confirmed this interaction as a reduced level of biologically active gibberellin was measured in cv. Désirée. The intertwining of sRNA and hormonal networks revealed here, sheds novel insights into regulation of developmental signalling, symptoms development and stress signalling.
 M. Zhao, C. Cai, J. Zhai, F. Lin, L. Li, J. Shreve, J. Thimmapuram, T. J. Hughes, B. C. Meyers, and J. Ma, “Coordination of MicroRNAs, PhasiRNAs, and NB-LRR Genes in Response to a Plant Pathogen: Insights from Analyses of a Set of Soybean Rps Gene Near-Isogenic Lines,” Plant Genome, vol. 8, no. 1, p. 0, 2015.
 P. Peláez and F. Sanchez, “Small RNAs in plant defense responses during viral and bacterial interactions: similarities and differences.,” Front. Plant Sci., vol. 4, no. September, p. 343, 2013.
Acknowledgments: This work was financially supported by the Slovenian Research Agency (contract No. J1-4268 and 1000-15-0105).
Mouse genotypes drive the liver and adrenal gland clocks
Uršula Prosenc Zmrzljak1, Rok Košir1, Anja Korenčič1, Peter Juvan1, Jure Ačimović2, Damjana Rozman1,2
1Center for Functional Genomics and Bio-Chips, Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia, 2Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
Circadian rhythms regulate a plethora of physiological processes. Perturbations of the rhythm can result in pathologies which are frequently studied in inbred mouse strains. We show that the genotype of mouse lines defines the circadian gene expression patterns. Expression of the majority of core clock and output metabolic genes are phase delayed in the C56BL/6J line compared to 129S2 in the adrenal glands and the liver. Circadian amplitudes are generally higher in the 129S2 line. Experiments in dark – dark (DD) and light – dark conditions (LD), exome sequencing and data mining proposed that mouse lines differ in single nucleotide variants in the binding regions of clock related transcription factors in open chromatin regions. One of the possible mechanisms of differential circadian expression could be the entrainment and transmission of the light signal to the peripheral organs. This is supported by the genotype effect in adrenal glands that is largest under LD, and by the high number of single nucleotide variants in the Receptor, Kinase and G-protein coupled receptor Panther molecular function categories. Different phenotype of these two mouse strains and changed amino acid sequence of the Period 2 protein possibly contributes further to the measured differences incircadian gene expression.
Influence of selected UV-filters on ABCB5 gene expression in melanoma cells
Tanja Prunk Zdravković1,2, Bogdan Zdravković2, Marko Zdravković3, Jan Schmidt2, Borut Strukelj4, Polonca Ferk2
1Celje General and Teaching Hospital, Celje, Slovenia, 2Medical faculty, University of Maribor, Maribor, Slovenia, 3University Medical Centre Maribor, Maribor, Slovenia, 4Faculty of Pharmacy, University of Ljubljana, Ljubljana, Slovenia
There is no clear evidence on whether sunscreens and personal care products containing UV filters like Octocrylene (OCT) and Titanium dioxide (TiO2) are protective against or may be a contributing factor in melanoma development. A transmembrane protein ABCB5 is involved in tumor progression, disease recurrence and in melanoma clinical drug resistance. The aim of the present study was to investigate the influence of OCT and TiO2 on the proliferation activity of melanoma cells and on their ABCB5 mRNA expression.
Metastatic melanoma cell line was used and treated with different concentrations (from 1 to 250 µg/mL) of OCT or TiO2 (in the form of nanoparticles: nano-TiO2 or with the average particle size of ≤ 5micron: micro-TiO2) and incubated for up to 144 hours. We used the MTT and LDH assays to measure cells' proliferation activity and cytotoxicity, respectively. Quantitative real-time PCR using TaqMan chemistry was performed and relative gene expression ratios were calculated for the target (ABCB5) and the reference - endogenous control (LDHA) gene. We used ANOVA and post-hoc Bonferroni tests for statistical analysis.
Results and conclusions
OCT group resulted in increased ABCB5 mRNA expression at 24 and 48h of exposure when compared to 2h (p<0.01). The increase was 2-fold at 250 µg/mL and 5-6 fold at lower OCT concentrations after 48h. Concomitantly, reduced cell number for 1.3% to 11.6% at 48h, increased proliferation activity at 8h and thereafter decreased, and morphological changes (including cannibalistic activity) were observed.
On the other hand, our results suggest that TiO2 might open a new window in the treatment modalities of melanoma. Micro-TiO2 is progressively decreasing the ABCB5 mRNA expression, however nano-TiO2 has a rebound increase at 48h of exposure at all but one concentrations (p<0.05) and then a significant decrease after 120 hours of exposure (p<0.01). This increase raises questions which should be answered before any potential use in medicine.
Exploring Human Variation and its impact on proteins and in the Clinic
Janet Thornton1, Roman Laskowski1, David Marcus1
Genome sequencing has opened up the possibility of exploring the genetic basis of human evolution and the differences between individuals. Using computational methods and 3D protein structures, we compared disease associated variants with ‘natural’ variants observed in 1000 human genomes, showing significant differences in their distributions. Many of the mutations that most often cause diseases were the least frequently observed ‘natural’ variants. Recently we have been exploring how protein domain information can help in the interpretation of the effects of non-synonymous mutations, especially from a structural perspective. We observe that the ‘equivalent’ mutation in the same domain family but in different proteins, can have very different consequences, depending on the context in which it occurs.
The 100,000 genomes project, funded by the UK government through the NHS, provides a stimulus to bring this new technology into the clinic. This brings both challenges and great opportunities. One of the major challenges is handling the scale and interpreting the complexity of genomic data. This will require a close collaboration between the basic biological and clinical sciences if it is to make a major impact. These challenges will be discussed.
Embeding PDF documents not supported by your browser.
Figure 1:Figure Legend: Inherited diseases and causative variants. Structure can help explain a variant’s disruption of protein function (by Roman Laskowski)
2006 - University of Ljubljana, Faculty of Medicine, Center for Functional Genomics and Bio-chips.