Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Font Size: 
On the noisy high-dimensional gene expression data analysis
Angela Serra, Pietro Coretto, Roberto Tagliaferri

Last modified: 2017-05-11

Abstract


The main goal of microarray experiments is to identify, within thousands of genes, groups that show similar co-expression patterns. In most cases the analysis starts from the estimation of a sample correlation matrix used to construct the  input dissimilarity. However, the sample correlation matrix is highly distorted by the presence of outlying experimental units, and the typical large ratio between the number of genes and the number of patients. We review the joint action of these issues, and we discuss some possible remedies. We consider real data from some well known microarray experiments, and we perform cluster analysis based on both the usual sample correlation, and some "cleaned" alternatives. Finally, we investigate on the differences between the obtained groups and we draw some conclusions.