Genomics Area

DNA and RNA sequencing has become an invaluable tool for fundamental and applied research in areas as diverse as cancer genetics, rare disorders, host-pathogen interactions, preservation of endangered species, evolutionary studies and the improvement of species of agricultural/ farm interest.

The OmicsTech Genomics Area is formed by the Centro Nacional de Análisis Genómico (CNAG) sequencing platform and the Genomics and Transcriptomics facility of the Center for Omic Sciences (COS). The CNAG is one of the top European centres in terms of sequencing capacity and diversity of capabilities. It focuses its activities on 2nd and 3rd generation sequencing technologies and the corresponding analysis tools. The CNAG expertise is based on extensive experience in sample quality control, library construction methods, sequencing, lab automation, data banking and data analysis. The offering portfolio is in constant evolution and it includes state-of the art applications such as long read nanopore sequencing, single-cell RNA sequencing and ATAC-sequencing. Processes at the CNAG run under ISO 9001:2015 certification and ISO 17025:2017 and ISO 27001 accreditations. The services offered by the COS through its Genomics and Transcriptomics facility are focused on small genome sequencing and metagenomic studies.

The Genomics area of the OmicsTech can guide researchers at any project phase from the design of the experiments to the data analysis.

For further information and discussion of possible ways of collaboration with the CNAG or the COS facilities, please contact projectmanager@cnag.eu

Applications

Whole genome sequencing with short reads (Illumina)

Whole genome sequencing (WG-Seq) is the process of determining the complete DNA sequence of an organism's genome at a single time. It delivers a base-by-base view of all genomic alterations, including single nucleotide variants (SNVs), small insertions and deletions (indels), copy number changes (CNVs), and structural variations (SVs). It can also be used in metagenomic studies for accurate identification of species from environmental samples.

Whole genome sequencing with long reads (ONT)

Whole genome sequencing with ONT technology produces reads that are several kilobases long. They are extremely useful to improve genome assemblies, to identify large structural variations and to phase alleles to their respective parental homolog.

Whole genome Enzimatic Methyl sequencing

Whole genome Enzymatic Methyl sequencing is a new method for identification of 5-mC and 5-hmC, in contrast to the traditional bisulfite sequencing (WGBS-Seq) which has been the gold-standard for DNA methylation at single-base resolution. Whole genome Enzymatic Methyl sequencing produces high quality libraries that enable superior detection of 5-mC and 5-hmC from fewer sequencing reads.

Whole exome sequencing

Whole exome sequencing is used to investigate all protein-coding regions of the genome with enhanced coverage for disease-associated genes. It is suitable to identify nucleotide variants across coding regions, being a cost-effective alternative to whole-genome sequencing, and producing smaller and more manageable data sets compared to whole-genome sequencing data.

Targeted sequencing

Targeted sequencing is a highly directed approach that enables the analysis of genetic variation in specific genomic regions, using pre-designed gene panels, custom gene panels or amplicon sequencing. Potential applications are the discovery of rare mutations in complex samples (such as highly heterogeneous tumour samples) or sequencing the bacterial 16S rRNA gene across multiple species, a widely used method for phylogeny and taxonomy studies.

RNA sequencing with short reads (Illumina)

RNA sequencing is a sensitive and accurate method for determining the primary sequence and relative abundance of all RNA molecules. It provides strand-specific information that allows assigning transcripts to the corresponding DNA strand. Most common applications are differential gene expression, allele-specific gene expression, alternative splicing, fusion transcripts, de novo transcriptome assembly and genome annotation, and single nucleotide variant identification.

RNA sequencing with long reads (ONT)

RNA sequencing with long reads allows full-length characterization of native RNA or cDNA, as well as the identification of complex transcript isoforms, and chimeric or gene fusion transcripts.

HiC sequencing

The three-dimensional configuration of the genome is complex, dynamic and crucial for gene regulation. Hi-C sequencing is a chromosome conformation capture technique that reveals the interactions between different pieces of DNA.

Single cell sequencing

Single cell sequencing allows analyses of individual cells. Looking at both complex systems, such as tissues or organs, and at single-cell level, the cellular heterogeneity and the state of each cell type is revealed. Most common applications are Single Cell RNA sequencing, Single Cell ATAC sequencing, Single Cell immune profiling (BRC/TRC), CITE sequencing and Single Cell Multiome ATAC + Gene Expression.

Spatial transcriptomics

Spatial transcriptomics allows to study gene expression within an unperturbed tissue microenvironment and architecture maintained. It  allows to study the processes that happen in whole organismal systems as realistic as possible.

ATAC sequencing

The assay for transposase-accessible chromatin sequencing (ATAC-Seq) is a rapid and sensitive technique to assess genome-wide chromatin accessibility. It uses the Tn5 transposome to detect nucleosome-free regions of the genome. It is widely used for nucleosome mapping and to determine transcription factor occupancy.

Whole exome sequencing plus variant confirmation by Sanger Sequencing

Next-generation sequencing (NGS) has dramatically changed the molecular diagnostic area. Whole exome sequencing by NGS and confirmation of the reported candidate variants by Sanger Sequencing is a common practice in many clinical labs.

DNA fingerprinting by whole genome sequencing and fragment analysis

DNA fingerprinting enables identification of individuals using hair, blood, semen, or other biological samples, based on unique patterns (polymorphisms) in their DNA. It can be done by whole genome sequencing in the discovery phase and/or by amplified fragment length polymorphism (AFLP, detection of multiple DNA restriction fragments by means of PCR amplification) for larger number of samples.

Metagenomics sequencing by whole genome and amplicon-based (16S, 18S, or ITS) sequencing

Metagenomics is the study of genetic material recovered directly from environmental samples. Accurate identification of species is a major challenge and can be done by two complementary approaches: whole genome sequencing (WG-Seq) and 16S, 18S or ITS amplicon sequencing. The latest is a cost-effective approximation to study large number of samples although is limited to one specific gene, while the WG-Seq strategy can be applied to lower number of samples but works well for all organisms found in the same sample, prokaryotes and eukaryotes.

Targeted Sequencing of small fragments by capillary electrophoresis.

The Sanger method is based on sequentially synthesizing a strand of DNA complementary to a single strand (used as a template), in the presence of DNA polymerase, the four 2'-deoxynucleotides that make up the DNA sequence (dATP, dGTP, dCTP and dTTP) and four dideoxynucleotides. Using specific primers for known genes can detect different polymorphisms in the sequence.

Fragments analysis by capillary electrophoresis.

Microsatellites, also known as simple sequence repeats (SSRs) or short tandem repeats (STRs), have been popular markers due to their high polymorphism. The PCR reaction is performed with fluorescent dye-labelled primers, then the PCR fragments can be analysed on a capillary DNA sequencing machine, and the data is analysed using GeneMapperTM software.

Targeted gene expression

Quantification of allelic expression of specific genes using TaqMan™ assays or SYBR Green intercalator assays can be performed using RT-PCR system.

TaqMan SNP Genotyping analysis

TaqMan is a commonly used SNP genotyping method developed by Life Technologies, which is an advanced, mature, validated, and widely used technology using RT-PCR system. Each TaqMan genotyping assay contains two primers for amplifying the sequence of interest and two allele-specific and differently labeled TaqMan probes for allele detection. Each allele-specific MGB probe is labelled with a fluorescent reporter dye (either a FAM or a VIC reporter molecule) in the 5’ end and is attached with a fluorescence quencher to the 3’ end.

Gene expression or miRNA analysis by microarrays

DNA microarrays are microscope slides that are printed with thousands of tiny spots in defined positions, with each spot containing a known DNA sequence or gene. To perform a microarray analysis, RNA molecules (mRNA for gene expression and miRNA for miRNA analysis) are typically collected from both an experimental sample and a reference sample. The samples are then converted into complementary DNA (cDNA), and each sample is labelled with a fluorescent probe. The data gathered through microarrays can be used to create gene expression profiles, which show simultaneous changes in the expression of many genes in response to a particular condition or treatment.

Small genome sequencing

Small organism whole genome sequencing can be performed using a next generation sequencing platform.

Small RNA and miRNA sequencing by Ion Torrent

The small RNA sequencing service covers the existing small RNA molecule sequencing and novel small RNA discovery, mutation characterization, and expression profiling of small RNAs by leveraging advanced NGS technologies and the data analysis pipeline.

Mitochondrial DNA sequencing

Mitochondria play a very important role in important cellular functions. Mitochondrial DNA sequencing is a useful tool for researchers studying human diseases, and can also be also in population genetics and biodiversity assessments

Image analysing of visible spectra and quimioluminiscence

Imaging systems are used for the detection, quantitation, and analysis of proteins and nucleic acids in gels and on membranes. They can be used for detection and automated data analysis with all common modes of protein and nucleic acid staining and labelling: colorimetric, fluorescent and chemiluminescent.

Bioinformatic Applications

Germline variant identification and annotation

Next-generation sequencing is extensively used to test for inherited disorders and to identify germline variants associated with complex disorders. Our extensively benchmarked pipeline identifies germline single nucleotide variants and small insertions and deletions from whole genome sequencing, whole exome sequencing or targeted sequencing data. The standard pipeline can be customised to take into account specific challenges such as polyploidy or distant reference genomes. A specific analysis pipeline enables the analysis of organisms with or without reference genome.

Somatic variant identification and annotation

Genomic characterization of tumours is increasingly being used to guide treatment decisions. Our extensively benchmarked pipeline can identify somatic single nucleotide variants and small insertions and deletions from whole genome sequencing, whole exome sequencing or targeted sequencing data. The standard pipeline can be customised to take into account specific challenges such as the absence of paired control samples.

Transcript quantification and differential expression analysis

RNA sequencing analysis is widely used to functionally characterize organisms, tissues or cells. Our RNA sequencing analysis pipeline includes transcript quantification, differential gene expression analysis, differential alternate splicing analysis, detection of gene fusion events and single nucleotide variant identification from transcripts.

Whole genome and whole transcriptome de novo assembly

De novo sequence assembly is challenging, not only because of the sheer size of the data and computational requirements, but also due to repetitive elements, polyploidy and variation (single-nucleotide, insertions/deletions, and larger structural variants). We aim to meet these challenges by optimizing and tuning our analysis strategy as each project demands.

Methylation analysis

Epigenetic changes, such as cytosine methylation, are known to play an important role in the regulation of gene expression. Our pipeline allows large-scale, high-performance analysis of DNA methylation sequencing datasets, as well as single nucleotide variant identification.

3D genome analysis

The three-dimensional organization of the genome plays important, yet poorly understood roles in gene regulation. We have pioneered the development of hybrid methods for determining the structures of genomes and genomic domains from HiC sequencing data.

Single cell sequencing analysis

Single cell sequencing data can be very useful to elucidate cellular heterogeneity and related dynamics in organs and organisms, in health and disease, in humans and model systems. We have sophisticated computational pipelines that allow a variety of analysis including distance between single cells, de-convolution, clustering, differential expression and hierarchical markers.

Spatial transcriptomics analysis

Spatial transcriptomics data is used to accelerate discoveries and advance treatments, harnessing the prognostic value of tissue resections and biopsies. Out team strives to identify diagnostic markers, targets for prevention and intervention and predictive biomarkers for immunotherapy response.

Microarray Analysis

The microarray data is analysed for gene expression (mRNA and miRNA) using the Gene Expression and miRNA workflow in GeneSpring GX 13.1. Pathway analysis can also be done using the same software

Equipment

Illumina sequencing instruments
NovaSeq and MiSeq

Illumina short read sequencing instruments with diverse capabilities, that allow processing various standard and custom sequencing applications.

ONT sequencing instruments
GridION and PromethION

Oxford Nanopore Technologies (ONT) long read sequencing instruments with medium and high throughput capabilities. They produce ultra-long reads of several kilobases, only limited by the length of the molecules to be sequenced.

PromethION
Single cell/ DNA molecule capture system
Chromium Controller, Chromium X and Chromium Connect

Advanced microfluidics platform where single cells/ DNA molecules are encapsulated in nanolitre microreactor droplets. It combines large partition numbers with a massively diverse barcode library to generate >100,000 barcode-containing partitions.

10X
Spatial sequencing platforms
Visium, CosMx, Xenium

In situ sequencing platforms that enable molecular profiling and high-multiplex multiomics at cellular and subcellular resolution in intact tissues.

Systems for automated liquid handling
Gilson PIPETMAX 268, Sciclone NGS Workstations, Zephyr SPE, Mantis, JANUS G3 and BRAVO

Automated liquid handling systems for processing up to 96 samples or sequencing libraries simultaneously, in pre-PCR, semi-pre-PCR and post-PCR restricted areas.

Systems for DNA/RNA fragmentation
Covaris E210, Covaris LE220-plus

Systems for automated high throughput ultrasonication that support the fragmentation of up to 96 DNA samples simultaneously.

Systems for quantification and quality controlling DNA/RNA samples and libraries
Synergy™ HT Multi-Mode Microplate Reader, Qubit, Nanodrop 2000, Bioanalyzers 2100, FEMTO Pulse, SE-NUGENIUS Gel Imaging, TapeStation Fragment Analyzer and ChemiDoc Gel Imaging

Fluorometers, microvolume spectrophotometer and parallel capillary electrophoresis systems for quantification and integrity evaluation of DNA/RNA samples and sequencing libraries. Imaging systems for the detection and quantitation of nucleic acids.

Real-time PCR instruments
Light Cycler 480 and ABI 7900HT real-time PCR

Medium- to high-throughput real-time PCR platforms that supports mono- or multicolor applications, as well as multiplex protocols.

Laboratory Information Management System
LIMS

In-house developed LIMS for tracking projects, samples, libraries, sequencing runs and results.

CNAG informatics infrastructure
Computing cluster

CNAG computing cluster, with 10,000 computing cores and 21 petabytes of storage capacity (14 PB on disk and 7 PB on tape).

CPD
Ion Torrent sequencing instruments
PGM and S5 System

Next-generation sequencing (NGS) is a high-throughput methodology that allows rapid sequencing of base pairs in DNA or RNA samples. It can be used in a wide range of applications, for example, in gene expression profiling, for the detection of epigenetic changes and for molecular analysis.

3500 Genetic Analyzer

A capillary electrophoresis for Sanger sequencing suitable for processing up to two 96-well sample plates at a time.

Agilent Microarrays scanner

The core principle behind microarrays is the hybridization between two DNA strands, the property of complementary nucleic acid sequences to specifically pair with each other by forming hydrogen bonds between complementary nucleotide base pairs. Fluorescently labelled target sequences that bind to a probe sequence generate a signal. Microarrays use relative quantitation in which the intensity of a feature (signal) is compared to the intensity of the same feature under a different condition, and the identity of the feature is known by its position.