在美国,结直肠癌是一种常见且致命的疾病,到2020年死亡人数超过50,000。这种进行性疾病通过早期发现和治疗是高度可预防的,但是许多人不遵守推荐的筛查指南。肠道微生物组已成为结直肠癌非侵入性检测的有希望的目标。大多数基于微生物组的分类工作利用来自操作分类单位(OTU)或扩增子序列变体(ASV)的分类学丰度数据,目的是提高分类学分辨率。然而,目前尚不清楚哪种分类分辨率对于基于微生物组的结直肠癌分类是最佳的.为了解决这个问题,我们使用一个可重复的机器学习框架来量化基于被注释到门的数据的模型的分类性能,类,订单,家庭,属,OTU,ASV水平。我们发现模型性能随着分类分辨率的增加而增加,达到家庭水平,在家庭中表现相等(P>0.05)(接受者工作特征曲线下的平均面积[AUROC],0.689),属(平均AUROC,0.690),和OTU(平均AUROC,0.693)在ASV水平降低之前的水平(P<0.05;平均AUROC,0.676)。这些结果证明了分类分辨率和预测性能之间的权衡,其中粗略的分类分辨率(例如,门)不够明显,但分辨率很好(例如,ASV)过于个性化,无法准确分类样本。类似于金发姑娘和三只熊的故事(L.B.Cauley,金发姑娘和三只熊,1981),中距离分辨率(即,家庭,属,和OTU)对于根据微生物组数据进行结直肠癌的最佳预测“恰到好处”。重要性尽管是高度可预防的,结直肠癌仍然是美国癌症相关死亡的主要原因.低成本,无创检测方法可以大大提高我们识别和治疗早期疾病的能力。微生物组已显示出有望作为检测结直肠癌的资源。对肠道微生物组的研究往往集中在提高我们对物种和菌株水平分类分辨率的能力上。然而,我们发现,更精细的分辨率阻碍了基于肠道微生物组预测结直肠癌的能力.这些结果强调了需要考虑微生物组分析的适当分类分辨率,并且更精细的分辨率并不总是更多的信息。
Colorectal cancer is a common and deadly disease in the United States accounting for over 50,000 deaths in 2020. This progressive disease is highly preventable with early detection and treatment, but many people do not comply with the recommended screening guidelines. The gut microbiome has emerged as a promising target for noninvasive detection of colorectal cancer. Most microbiome-based classification efforts utilize taxonomic abundance data from operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) with the goal of increasing taxonomic resolution. However, it is unknown which taxonomic resolution is optimal for microbiome-based classification of colorectal cancer. To address this question, we used a reproducible machine learning framework to quantify classification performance of models based on data annotated to phylum, class, order, family, genus, OTU, and ASV levels. We found that model performance increased with increasing taxonomic resolution, up to the family level where performance was equal (P > 0.05) among family (mean area under the receiver operating characteristic curve [AUROC], 0.689), genus (mean AUROC, 0.690), and OTU (mean AUROC, 0.693) levels before decreasing at the ASV level (P < 0.05; mean AUROC, 0.676). These results demonstrate a trade-off between taxonomic resolution and prediction performance, where coarse taxonomic resolution (e.g., phylum) is not distinct enough, but fine resolution (e.g., ASV) is too individualized to accurately classify samples. Similar to the story of Goldilocks and the three bears (L. B. Cauley, Goldilocks and the Three Bears, 1981), mid-range resolution (i.e., family, genus, and OTU) is \"just right\" for optimal prediction of colorectal cancer from microbiome data. IMPORTANCE Despite being highly preventable, colorectal cancer remains a leading cause of cancer-related death in the United States. Low-cost, noninvasive detection methods could greatly improve our ability to identify and treat early stages of disease. The microbiome has shown promise as a resource for detection of colorectal cancer. Research on the gut microbiome tends to focus on improving our ability to profile species and strain level taxonomic resolution. However, we found that finer resolution impedes the ability to predict colorectal cancer based on the gut microbiome. These results highlight the need for consideration of the appropriate taxonomic resolution for microbiome analyses and that finer resolution is not always more informative.