Grupo de Ingeniería Estadística Multivariante GIEM

Loading...
OrgUnit Logo
Date established
City
Country
ID
Description

Publication Search Results

Now showing 1 - 10 of 23
  • Publication
    Differential expression in RNA-seq: A matter of depth
    (Cold Spring Harbor Laboratory Press, 2011-09-08) Tarazona Campos, Sonia; García-Alcalde, Fernando; Dopazo, Joaquín; Ferrer Riquelme, Alberto José; Conesa, Ana; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Industrial; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM; Generalitat Valenciana; Ministerio de Ciencia e Innovación
    Next-generation sequencing (NGS) technologies are revolutionizing genome research, and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. However, the properties of RNA-seq data have not been yet fully established, and additional research is needed for understanding how these data respond to differential expression analysis. In this work, we set out to gain insights into the characteristics of RNA-seq data analysis by studying an important parameter of this technology: the sequencing depth. We have analyzed how sequencing depth affects the detection of transcripts and their identification as differentially expressed, looking at aspects such as transcript biotype, length, expression level, and fold-change. We have evaluated different algorithms available for the analysis of RNA-seq and proposed a novel approach-NOISeq-that differs from existing methods in that it is data-adaptive and nonparametric. Our results reveal that most existing methodologies suffer from a strong dependency on sequencing depth for their differential expression calls and that this results in a considerable number of false positives that increases as the number of reads grows. In contrast, our proposed method models the noise distribution from the actual data, can therefore better adapt to the size of the data set, and is more effective in controlling the rate of false discoveries. This work discusses the true potential of RNA-seq for studying regulation at low expression ranges, the noise within RNA-seq data, and the issue of replication. © 2011 by Cold Spring Harbor Laboratory Press.
  • Publication
    Qualimap: evaluating next generation sequencing alignment data
    (Oxford University Press (OUP): Policy B - Oxford Open Option B, 2012) García-Alcalde, Fernando; Okonechnikov, Konstantin; Carbonell Caballero, José; Cruz, Luís M.; GOTZ, STEFAN; Tarazona Campos, Sonia; Dopazo, Joaquín; Meyer, Thomas F.; Conesa, Ana; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM
    Motivation: The sequence alignment/map (SAM) and the binary alignment/map (BAM) formats have become the standard method of representation of nucleotide sequence alignments for next-generation sequencing data. SAM/BAM ¿les usually contain information from tens to hundreds of millions of reads. Often, the sequencing technology, protocol, and/or the selected mapping algorithm introduce some unwanted biases in these data. The systematic detection of such biases is a non-trivial task that is crucial to to drive appropriate downstream analyses. Results: We have developed Qualimap, a Java application that supports user-friendly quality control of mapping data, by considering sequence features and their genomic properties. Qualimap takes sequence alignment data and provides graphical and statistical analyses for the evaluation of data. Such quality-control data are vital for highlighting problems in the sequencing and/or mapping processes, which must be addressed prior to further analyses
  • Publication
    Transcriptome modulation during host shift is driven by secondary metabolites in desert Drosophila
    (Wiley, 2016-09) De Panis, Diego N.; Padró, Julián; Furió Tarí, Pedro; Tarazona Campos, Sonia; Milla Carmona, Pablo S.; Soto, Ignacio M.; Dopazo, Hernán; Conesa, Ana; Hasson, Esteban; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM; Ministerio de Economía y Competitividad; Ministerio de Ciencia e Innovación; European Regional Development Fund; Universidad de Buenos Aires; Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina; Agencia Nacional de Promoción Científica y Tecnológica, Argentina
    [EN] High-throughput transcriptome studies are breaking new ground to investigate the responses that organisms deploy in alternative environments. Nevertheless, much remains to be understood about the genetic basis of host plant adaptation. Here, we investigate genome-wide expression in the fly Drosophila buzzatii raised in different conditions. This species uses decaying tissues of cactus of the genus Opuntia as primary rearing substrate and secondarily, the necrotic tissues of the columnar cactus Trichocereus terscheckii. The latter constitutes a harmful host, rich in mescaline and other related phenylethylamine alkaloids. We assessed the transcriptomic responses of larvae reared in Opuntia sulphurea and T. terscheckii, with and without the addition of alkaloids extracted from the latter. Whole-genome expression profiles were massively modulated by the rearing environment, mainly by the presence of T. terscheckii alkaloids. Differentially expressed genes were mainly related to detoxification, oxidation–reduction and stress response; however, we also found genes involved in development and neurobiological processes. In conclusion, our study contributes new data onto the role of transcriptional plasticity in response to alternative rearing environments.
  • Publication
    Association Between Sex Hormone Levels and Clinical Outcomes in Patients With COVID-19 Admitted to Hospital: An Observational, Retrospective, Cohort Study
    (Frontiers Media, 2022-01-27) Beltrame, Anna; Salguero-García, Pedro; Rossi, Emanuela; Conesa, Ana; Moro, Lucia; Bettini, Laura Rachele; Rizzi, Eleonora; D Angió, Mariella; Deiana, Michela; Piubelli, Chiara; Rebora, Paola; Duranti, Silvia; Bonfanti, Paolo; Capua, Ilaria; Tarazona Campos, Sonia; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM; Ministero della Salute
    [EN] Understanding the cause of sex disparities in COVID-19 outcomes is a major challenge. We investigate sex hormone levels and their association with outcomes in COVID-19 patients, stratified by sex and age. This observational, retrospective, cohort study included 138 patients aged 18 years or older with COVID-19, hospitalized in Italy between February 1 and May 30, 2020. The association between sex hormones (testosterone, estradiol, progesterone, dehydroepiandrosterone) and outcomes (ARDS, severe COVID-19, in-hospital mortality) was explored in 120 patients aged 50 years and over. STROBE checklist was followed. The median age was 73.5 years [IQR 61, 82]; 55.8% were male. In older males, testosterone was lower if ARDS and severe COVID-19 were reported than if not (3.6 vs. 5.3 nmol/L, p =0.0378 and 3.7 vs. 8.5 nmol/L, p =0.0011, respectively). Deceased males had lower testosterone (2.4 vs. 4.8 nmol/L, p =0.0536) and higher estradiol than survivors (40 vs. 24 pg/mL, p = 0.0006). Testosterone was negatively associated with ARDS (OR 0.849 [95% CI 0.734, 0.982]), severe COVID-19 (OR 0.691 [95% CI 0.546, 0.874]), and in-hospital mortality (OR 0.742 [95% CI 0.566, 0.972]), regardless of potential confounders, though confirmed only in the regression model on males. Higher estradiol was associated with a higher probability of death (OR 1.051 [95% CI 1.018, 1.084]), confirmed in both sex models. In males, higher testosterone seems to be protective against any considered outcome. Higher estradiol was associated with a higher probability of death in both sexes.
  • Publication
    RGmatch: matching genomic regions to proximal genes in omics data integration
    (BioMed Central, 2016) Furió Tarí, Pedro; Conesa, Ana; Tarazona Campos, Sonia; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM; European Commission; Ministerio de Economía y Competitividad
    [EN] Background: The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area) is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. Results: In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. Conclusions: RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher’s specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch’s flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.
  • Publication
    acorde unravels functionally interpretable networks of isoform co-usage from single cell data
    (Nature Publishing Group, 2022-04-05) Arzalluz-Luque, Ángeles; Salguero-García, Pedro; Tarazona Campos, Sonia; Conesa, Ana; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM; Agencia Estatal de Investigación; Ministerio de Ciencia e Innovación; National Institutes of Health, EEUU; Universitat Politècnica de València
    [EN] Alternative splicing (AS) is a highly-regulated post-transcriptional mechanism known to modulate isoform expression within genes and contribute to cell-type identity. However, the extent to which alternative isoforms establish co-expression networks that may be relevant in cellular function has not been explored yet. Here, we present acorde, a pipeline that successfully leverages bulk long reads and single-cell data to confidently detect alternative isoform co-expression relationships. To achieve this, we develop and validate percentile correlations, an innovative approach that overcomes data sparsity and yields accurate coexpression estimates from single-cell data. Next, acorde uses correlations to cluster coexpressed isoforms into a network, unraveling cell type-specific alternative isoform usage patterns. By selecting same-gene isoforms between these clusters, we subsequently detect and characterize genes with co-differential isoform usage (coDIU) across cell types. Finally, we predict functional elements from long read-defined isoforms and provide insight into biological processes, motifs, and domains potentially controlled by the coordination of post-transcriptional regulation. The code for acorde is available at https://github.com/ConesaLab/acorde.
  • Publication
    MultiBaC: A strategy to remove batch effects between different omic data types
    (SAGE Publications, 2020-10) Ugidos, Manuel; Tarazona Campos, Sonia; Prats Montalbán, José Manuel; Ferrer Riquelme, Alberto José; Conesa, Ana; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Industrial; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM; Generalitat Valenciana
    [EN] Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform-i.e. gene expression- is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.
  • Publication
    ARSyN: a method for the identification and removal of systematic noise in multifactorial time-course microarray experiments
    (Oxford University Press (OUP): Policy B - Oxford Open Option A, 2011-11-14) Nueda, María J.; Ferrer Riquelme, Alberto José; Conesa, Ana; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Industrial; Grupo de Ingeniería Estadística Multivariante GIEM; Ministerio de Ciencia e Innovación
    Transcriptomic profiling experiments that aim to the identification of responsive genes in specific biological conditions are commonly set up under defined experimental designs that try to assess the effects of factors and their interactions on gene expression. Data from these controlled experiments, however, may also contain sources of unwanted noise that can distort the signal under study, affect the residuals of applied statistical models, and hamper data analysis. Commonly, normalization methods are applied to transcriptomics data to remove technical artifacts, but these are normally based on general assumptions of transcript distribution and greatly ignore both the characteristics of the experiment under consideration and the coordinative nature of gene expression. In this paper, we propose a novel methodology, ARSyN, for the preprocessing of microarray data that takes into account these 2 last aspects. By combining analysis of variance (ANOVA) modeling of gene expression values and multivariate analysis of estimated effects, the method identifies the nonstructured part of the signal associated to the experimental factors (the noise within the signal) and the structured variation of the ANOVA errors (the signal of the noise). By removing these noise fractions from the original data, we create a filtered data set that is rich in the information of interest and includes only the random noise required for inferential analysis. In this work, we focus on multifactorial time course microarray (MTCM) experiments with 2 factors: one quantitative such as time or dosage and the other qualitative, as tissue, strain, or treatment. However, the method can be used in other situations such as experiments with only one factor or more complex designs with more than 2 factors. The filtered data obtained after applying ARSyN can be further analyzed with the appropriate statistical technique to obtain the biological information required. To evaluate the performance of the filtering strategy, we have applied different statistical approaches for MTCM analysis to several real and simulateddata sets, studying also the efficiency of these techniques. By comparing the results obtained with the original and ARSyN filtered data and also with other filtering techniques, we can conclude that the proposed method increases the statistical power to detect biological signals, especially in cases where there are high levels of structural noise. Software for ARSyN is freely available at http://www.ua.es/personal/mj.nueda
  • Publication
    A survey of best practices for RNA-seq data analysis
    (BioMed Central, 2016-01) Conesa, Ana; Madrigal, Pedro; Tarazona Campos, Sonia; Gómez Cabrero, David; Cervera, Alejandra; McPherson, Andrew; Wojciech Szczesniak, Michal; Gaffney, Daniel J.; Elo, Laura L.; Zhang, Xuegong; Mortazavi, Ali; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM; European Commission; Ministerio de Economía y Competitividad; Juvenile Diabetes Research Foundation, EEUU; National Key Research and Development Program of China
    [EN] RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
  • Publication
    Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package
    (Oxford University Press (OUP), 2015-12-02) Tarazona Campos, Sonia; Furió Tarí, Pedro; Turrà, David; Di Pietro, Antonio; Nueda, María José; Ferrer Riquelme, Alberto José; Conesa, Ana; Dpto. de Estadística e Investigación Operativa Aplicadas y Calidad; Escuela Técnica Superior de Ingeniería Industrial; Escuela Técnica Superior de Ingeniería Informática; Grupo de Ingeniería Estadística Multivariante GIEM; European Commission; Ministerio de Ciencia e Innovación
    [EN] As the use of RNA-seq has popularized, there is an increasing consciousness of the importance of experimental design, bias removal, accurate quantification and control of false positives for proper data analysis. We introduce the NOISeq R-package for quality control and analysis of count data. We show how the available diagnostic tools can be used to monitor quality issues, make pre-processing decisions and improve analysis. We demonstrate that the non-parametric NOISeqBIO efficiently controls false discoveries in experiments with biological replication and outperforms state-of-the-art methods. NOISeq is a comprehensive resource that meets current needs for robust data-aware analysis of RNA-seq differential expression.