基于转录组的深度神经网络分类器，用于识别粘液癌的起源部位。A transcriptome-Based Deep Neural Network Classifier for Identifying the Site of Origin in Mucinous Cancer.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

UNASSIGNED: There is a lack of tools for identifying the site of origin in mucinous cancer. This study aimed to evaluate the performance of a transcriptome-based classifier for identifying the site of origin in mucinous cancer.
UNASSIGNED: Transcriptomic data of 1878 non-mucinous and 82 mucinous cancer specimens, with 7 sites of origin, namely, the uterine cervix (CESC), colon (COAD), pancreas (PAAD), stomach (STAD), uterine endometrium (UCEC), uterine carcinosarcoma (UCS), and ovary (OV), obtained from The Cancer Genome Atlas, were used as the training and validation sets, respectively. Transcriptomic data of 14 mucinous cancer specimens from a tissue archive were used as the test set. For identifying the site of origin, a set of 100 differentially expressed genes for each site of origin was selected. After removing multiple iterations of the same gene, 427 genes were chosen, and their RNA expression profiles, at each site of origin, were used to train the deep neural network classifier. The performance of the classifier was estimated using the training, validation, and test sets.
UNASSIGNED: The accuracy of the model in the training set was 0.998, while that in the validation set was 0.939 (77/82). In the test set which is newly sequenced from a tissue archive, the model showed an accuracy of 0.857 (12/14). t-SNE analysis revealed that samples in the test set were part of the clusters obtained for the training set.
UNASSIGNED: Although limited by small sample size, we showed that a transcriptome-based classifier could correctly identify the site of origin of mucinous cancer.

摘要：

未经证实：目前缺乏识别粘液性癌起源部位的工具。这项研究旨在评估基于转录组的分类器用于识别粘液性癌起源位点的性能。
未经证实：1878例非粘液性癌标本和82例粘液性癌标本的转录组数据，有7个产地，即,子宫颈（CESC），冒号(COAD),胰腺（PAAD），胃（STAD），子宫内膜（UCEC），子宫癌肉瘤（UCS），和卵巢（OV），从癌症基因组图谱中获得，被用作训练集和验证集，分别。来自组织档案的14个粘液性癌标本的转录组数据用作测试集。为了确定原产地，选择每个起源位点的一组100个差异表达基因。删除同一基因的多次迭代后，选择了427个基因，以及它们的RNA表达谱，在每个产地，用于训练深度神经网络分类器。使用训练来估计分类器的性能，验证,和测试集。
UNASSIGNED：训练集中模型的准确性为0.998，而验证集中模型的准确性为0.939（77/82）。在从组织档案中新测序的测试集中，模型的准确度为0.857(12/14)。t-SNE分析显示，测试集中的样本是训练集获得的聚类的一部分。
UNASSIGNED：尽管受样本量小的限制，我们表明，基于转录组的分类器可以正确识别粘液性癌的起源部位。