CatBoost

CatBoost
  • 文章类型: Journal Article
    梯度提升决策树(GBDT)是大数据中分类和回归任务的强大工具。研究人员应该熟悉当前实施GBDT的优点和缺点,以便有效地使用它们并做出成功的贡献。CatBoost是GBDT机器学习集成技术家族的成员。自2018年底首次亮相以来,研究人员已经成功地将CatBoost用于涉及大数据的机器学习研究。我们借此机会回顾一下CatBoost与大数据相关的最新研究,并从积极的研究中学习CatBoost的最佳实践,以及CatBoost不会超越其他技术的研究,因为我们可以从这两种情况中吸取教训。此外,作为基于决策树的算法,CatBoost非常适合涉及分类的机器学习任务,异构数据。最近跨多个学科的工作说明了CatBoost在分类和回归任务中的有效性和缺点。我们在CatBoost文献中公开的另一个重要问题是它对超参数的敏感性以及超参数调整的重要性。我们做出的贡献之一是采取跨学科的方法来涵盖与CatBoost相关的研究。这为研究人员提供了深入的了解,以帮助阐明CatBoost在解决问题中的正确应用。据我们所知,这是第一个在单一出版物中研究与CatBoost相关的所有作品的调查。
    Gradient Boosted Decision Trees (GBDT\'s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT\'s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost\'s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号