Font Size:
Sparse rule generating fold-change classification for molecular high-throughput profiles
Last modified: 2023-07-11
Abstract
Classifying gene expression profiles can be challenging due to its low sample size and high dimensionality. Often additional aspects such as sparsity and interpretability of the decision function are requested. Existing methods for finding distinctions between biological samples of different classes still often pose the challenge of biological interpretability and hypothesis formulation, as well as extensive data preprocessing.
Ensemble methods, such as the Set Covering Machine (SCM) enable the construction of classifiers depending only on a number of base classifiers. We propose two novel base classifiers, that consider relations between features for constructing interpretable decision functions, denoted fold change classifiers. Here, an intrinsic feature selection and a straight- forward semantic and syntactic interpretation is achieved. Therefore, constructing a decision function depending only on a subset of features, potentially also simplifies the formulation of a biological hypothesis. The proposed classifier no longer depends on equally scaled data, since relative measurements, namely pairwise relations within a sample are considered.
We show the applicability of the proposed method in a case study dealing with pancreatic neuroendocrine tumours (PanNETs). This is a rare but quite heterogeneous tumour entity lacking specific biomarkers for disease progression or objective judgement of therapeutic responses. Here, we could identify decision functions as a conjunction of fold changes that suggest new potential prognostic markers. The involved relations of genes could both be validated via a literature search and point to new genes and possibly mechanistic interactions to be further investigated.
Ensemble methods, such as the Set Covering Machine (SCM) enable the construction of classifiers depending only on a number of base classifiers. We propose two novel base classifiers, that consider relations between features for constructing interpretable decision functions, denoted fold change classifiers. Here, an intrinsic feature selection and a straight- forward semantic and syntactic interpretation is achieved. Therefore, constructing a decision function depending only on a subset of features, potentially also simplifies the formulation of a biological hypothesis. The proposed classifier no longer depends on equally scaled data, since relative measurements, namely pairwise relations within a sample are considered.
We show the applicability of the proposed method in a case study dealing with pancreatic neuroendocrine tumours (PanNETs). This is a rare but quite heterogeneous tumour entity lacking specific biomarkers for disease progression or objective judgement of therapeutic responses. Here, we could identify decision functions as a conjunction of fold changes that suggest new potential prognostic markers. The involved relations of genes could both be validated via a literature search and point to new genes and possibly mechanistic interactions to be further investigated.