DNA and RNA sequencing has become an invaluable tool for fundamental and applied research in areas as diverse as cancer genetics, rare disorders, host-pathogen interactions, preservation of endangered species, evolutionary studies and the improvement of species of agricultural/ farm interest.
The OmicsTech Genomics Area is formed by the Centro Nacional de Análisis Genómico (CNAG) sequencing platform and the Genomics and Transcriptomics facility of the Center for Omic Sciences (COS). The CNAG is one of the top European centres in terms of sequencing capacity and diversity of capabilities. It focuses its activities on 2nd and 3rd generation sequencing technologies and the corresponding analysis tools. The CNAG expertise is based on extensive experience in sample quality control, library construction methods, sequencing, lab automation, data banking and data analysis. The offering portfolio is in constant evolution and it includes state-of the art applications such as long read nanopore sequencing, single-cell RNA sequencing and ATAC-sequencing. Processes at the CNAG run under ISO 9001:2015 certification and ISO 17025:2017 and ISO 27001 accreditations. The services offered by the COS through its Genomics and Transcriptomics facility are focused on small genome sequencing and metagenomic studies.
The Genomics area of the OmicsTech can guide researchers at any project phase from the design of the experiments to the data analysis.
For further information and discussion of possible ways of collaboration with the CNAG or the COS facilities, please contact projectmanager@cnag.eu
Applications
Whole genome sequencing (WG-Seq) is the process of determining the complete DNA sequence of an organism's genome at a single time. It delivers a base-by-base view of all genomic alterations, including single nucleotide variants (SNVs), small insertions and deletions (indels), copy number changes (CNVs), and structural variations (SVs). It can also be used in metagenomic studies for accurate identification of species from environmental samples.
Whole genome sequencing with ONT technology produces reads that are several kilobases long. They are extremely useful to improve genome assemblies, to identify large structural variations and to phase alleles to their respective parental homolog.
Whole genome Enzymatic Methyl sequencing is a new method for identification of 5-mC and 5-hmC, in contrast to the traditional bisulfite sequencing (WGBS-Seq) which has been the gold-standard for DNA methylation at single-base resolution. Whole genome Enzymatic Methyl sequencing produces high quality libraries that enable superior detection of 5-mC and 5-hmC from fewer sequencing reads.
Whole exome sequencing is used to investigate all protein-coding regions of the genome with enhanced coverage for disease-associated genes. It is suitable to identify nucleotide variants across coding regions, being a cost-effective alternative to whole-genome sequencing, and producing smaller and more manageable data sets compared to whole-genome sequencing data.
Targeted sequencing is a highly directed approach that enables the analysis of genetic variation in specific genomic regions, using pre-designed gene panels, custom gene panels or amplicon sequencing. Potential applications are the discovery of rare mutations in complex samples (such as highly heterogeneous tumour samples) or sequencing the bacterial 16S rRNA gene across multiple species, a widely used method for phylogeny and taxonomy studies.
RNA sequencing is a sensitive and accurate method for determining the primary sequence and relative abundance of all RNA molecules. It provides strand-specific information that allows assigning transcripts to the corresponding DNA strand. Most common applications are differential gene expression, allele-specific gene expression, alternative splicing, fusion transcripts, de novo transcriptome assembly and genome annotation, and single nucleotide variant identification.
RNA sequencing with long reads allows full-length characterization of native RNA or cDNA, as well as the identification of complex transcript isoforms, and chimeric or gene fusion transcripts.
The three-dimensional configuration of the genome is complex, dynamic and crucial for gene regulation. Hi-C sequencing is a chromosome conformation capture technique that reveals the interactions between different pieces of DNA.
Single cell sequencing allows analyses of individual cells. Looking at both complex systems, such as tissues or organs, and at single-cell level, the cellular heterogeneity and the state of each cell type is revealed. Most common applications are Single Cell RNA sequencing, Single Cell ATAC sequencing, Single Cell immune profiling (BRC/TRC), CITE sequencing and Single Cell Multiome ATAC + Gene Expression.
Spatial transcriptomics allows to study gene expression within an unperturbed tissue microenvironment and architecture maintained. It allows to study the processes that happen in whole organismal systems as realistic as possible.
The assay for transposase-accessible chromatin sequencing (ATAC-Seq) is a rapid and sensitive technique to assess genome-wide chromatin accessibility. It uses the Tn5 transposome to detect nucleosome-free regions of the genome. It is widely used for nucleosome mapping and to determine transcription factor occupancy.
Next-generation sequencing (NGS) has dramatically changed the molecular diagnostic area. Whole exome sequencing by NGS and confirmation of the reported candidate variants by Sanger Sequencing is a common practice in many clinical labs.
DNA fingerprinting enables identification of individuals using hair, blood, semen, or other biological samples, based on unique patterns (polymorphisms) in their DNA. It can be done by whole genome sequencing in the discovery phase and/or by amplified fragment length polymorphism (AFLP, detection of multiple DNA restriction fragments by means of PCR amplification) for larger number of samples.
Metagenomics is the study of genetic material recovered directly from environmental samples. Accurate identification of species is a major challenge and can be done by two complementary approaches: whole genome sequencing (WG-Seq) and 16S, 18S or ITS amplicon sequencing. The latest is a cost-effective approximation to study large number of samples although is limited to one specific gene, while the WG-Seq strategy can be applied to lower number of samples but works well for all organisms found in the same sample, prokaryotes and eukaryotes.
The Sanger method is based on sequentially synthesizing a strand of DNA complementary to a single strand (used as a template), in the presence of DNA polymerase, the four 2'-deoxynucleotides that make up the DNA sequence (dATP, dGTP, dCTP and dTTP) and four dideoxynucleotides. Using specific primers for known genes can detect different polymorphisms in the sequence.
Microsatellites, also known as simple sequence repeats (SSRs) or short tandem repeats (STRs), have been popular markers due to their high polymorphism. The PCR reaction is performed with fluorescent dye-labelled primers, then the PCR fragments can be analysed on a capillary DNA sequencing machine, and the data is analysed using GeneMapperTM software.
Quantification of allelic expression of specific genes using TaqMan™ assays or SYBR Green intercalator assays can be performed using RT-PCR system.
TaqMan is a commonly used SNP genotyping method developed by Life Technologies, which is an advanced, mature, validated, and widely used technology using RT-PCR system. Each TaqMan genotyping assay contains two primers for amplifying the sequence of interest and two allele-specific and differently labeled TaqMan probes for allele detection. Each allele-specific MGB probe is labelled with a fluorescent reporter dye (either a FAM or a VIC reporter molecule) in the 5’ end and is attached with a fluorescence quencher to the 3’ end.
DNA microarrays are microscope slides that are printed with thousands of tiny spots in defined positions, with each spot containing a known DNA sequence or gene. To perform a microarray analysis, RNA molecules (mRNA for gene expression and miRNA for miRNA analysis) are typically collected from both an experimental sample and a reference sample. The samples are then converted into complementary DNA (cDNA), and each sample is labelled with a fluorescent probe. The data gathered through microarrays can be used to create gene expression profiles, which show simultaneous changes in the expression of many genes in response to a particular condition or treatment.
Small organism whole genome sequencing can be performed using a next generation sequencing platform.
The small RNA sequencing service covers the existing small RNA molecule sequencing and novel small RNA discovery, mutation characterization, and expression profiling of small RNAs by leveraging advanced NGS technologies and the data analysis pipeline.
Mitochondria play a very important role in important cellular functions. Mitochondrial DNA sequencing is a useful tool for researchers studying human diseases, and can also be also in population genetics and biodiversity assessments
Imaging systems are used for the detection, quantitation, and analysis of proteins and nucleic acids in gels and on membranes. They can be used for detection and automated data analysis with all common modes of protein and nucleic acid staining and labelling: colorimetric, fluorescent and chemiluminescent.
Bioinformatic Applications
Next-generation sequencing is extensively used to test for inherited disorders and to identify germline variants associated with complex disorders. Our extensively benchmarked pipeline identifies germline single nucleotide variants and small insertions and deletions from whole genome sequencing, whole exome sequencing or targeted sequencing data. The standard pipeline can be customised to take into account specific challenges such as polyploidy or distant reference genomes. A specific analysis pipeline enables the analysis of organisms with or without reference genome.
Genomic characterization of tumours is increasingly being used to guide treatment decisions. Our extensively benchmarked pipeline can identify somatic single nucleotide variants and small insertions and deletions from whole genome sequencing, whole exome sequencing or targeted sequencing data. The standard pipeline can be customised to take into account specific challenges such as the absence of paired control samples.
RNA sequencing analysis is widely used to functionally characterize organisms, tissues or cells. Our RNA sequencing analysis pipeline includes transcript quantification, differential gene expression analysis, differential alternate splicing analysis, detection of gene fusion events and single nucleotide variant identification from transcripts.
De novo sequence assembly is challenging, not only because of the sheer size of the data and computational requirements, but also due to repetitive elements, polyploidy and variation (single-nucleotide, insertions/deletions, and larger structural variants). We aim to meet these challenges by optimizing and tuning our analysis strategy as each project demands.
Epigenetic changes, such as cytosine methylation, are known to play an important role in the regulation of gene expression. Our pipeline allows large-scale, high-performance analysis of DNA methylation sequencing datasets, as well as single nucleotide variant identification.
The three-dimensional organization of the genome plays important, yet poorly understood roles in gene regulation. We have pioneered the development of hybrid methods for determining the structures of genomes and genomic domains from HiC sequencing data.
Single cell sequencing data can be very useful to elucidate cellular heterogeneity and related dynamics in organs and organisms, in health and disease, in humans and model systems. We have sophisticated computational pipelines that allow a variety of analysis including distance between single cells, de-convolution, clustering, differential expression and hierarchical markers.
Spatial transcriptomics data is used to accelerate discoveries and advance treatments, harnessing the prognostic value of tissue resections and biopsies. Out team strives to identify diagnostic markers, targets for prevention and intervention and predictive biomarkers for immunotherapy response.
The microarray data is analysed for gene expression (mRNA and miRNA) using the Gene Expression and miRNA workflow in GeneSpring GX 13.1. Pathway analysis can also be done using the same software
Equipment
Illumina short read sequencing instruments with diverse capabilities, that allow processing various standard and custom sequencing applications.
Oxford Nanopore Technologies (ONT) long read sequencing instruments with medium and high throughput capabilities. They produce ultra-long reads of several kilobases, only limited by the length of the molecules to be sequenced.
Advanced microfluidics platform where single cells/ DNA molecules are encapsulated in nanolitre microreactor droplets. It combines large partition numbers with a massively diverse barcode library to generate >100,000 barcode-containing partitions.
In situ sequencing platforms that enable molecular profiling and high-multiplex multiomics at cellular and subcellular resolution in intact tissues.
Automated liquid handling systems for processing up to 96 samples or sequencing libraries simultaneously, in pre-PCR, semi-pre-PCR and post-PCR restricted areas.
Systems for automated high throughput ultrasonication that support the fragmentation of up to 96 DNA samples simultaneously.
Fluorometers, microvolume spectrophotometer and parallel capillary electrophoresis systems for quantification and integrity evaluation of DNA/RNA samples and sequencing libraries. Imaging systems for the detection and quantitation of nucleic acids.
Medium- to high-throughput real-time PCR platforms that supports mono- or multicolor applications, as well as multiplex protocols.
In-house developed LIMS for tracking projects, samples, libraries, sequencing runs and results.
CNAG computing cluster, with 10,000 computing cores and 21 petabytes of storage capacity (14 PB on disk and 7 PB on tape).
Next-generation sequencing (NGS) is a high-throughput methodology that allows rapid sequencing of base pairs in DNA or RNA samples. It can be used in a wide range of applications, for example, in gene expression profiling, for the detection of epigenetic changes and for molecular analysis.
A capillary electrophoresis for Sanger sequencing suitable for processing up to two 96-well sample plates at a time.
The core principle behind microarrays is the hybridization between two DNA strands, the property of complementary nucleic acid sequences to specifically pair with each other by forming hydrogen bonds between complementary nucleotide base pairs. Fluorescently labelled target sequences that bind to a probe sequence generate a signal. Microarrays use relative quantitation in which the intensity of a feature (signal) is compared to the intensity of the same feature under a different condition, and the identity of the feature is known by its position.