关键词: Fraud detection boosting cross-validation hyperparameter penalised regression

来  源:   DOI:10.1080/02664763.2022.2070137   PDF(Pubmed)

Abstract:
Statistical fraud detection consists in making a system that automatically selects a subset of all cases (insurance claims, financial transactions, etc.) that are the most interesting for further investigation. The reason why such a system is needed is that the total number of cases typically is much higher than one realistically could investigate manually and that fraud tends to be quite rare. Further, the investigator is typically limited to controlling a restricted number k of cases, due to limited resources. The most efficient manner of allocating these resources is then to try selecting the k cases with the highest probability of being fraudulent. The prediction model used for this purpose must normally be regularised to avoid overfitting and consequently bad prediction performance. A loss function, denoted the fraud loss, is proposed for selecting the model complexity via a tuning parameter. A simulation study is performed to find the optimal settings for validation. Further, the performance of the proposed procedure is compared to the most relevant competing procedure, based on the area under the receiver operating characteristic curve (AUC), in a set of simulations, as well as on a credit card default dataset. Choosing the complexity of the model by the fraud loss resulted in either comparable or better results in terms of the fraud loss than choosing it according to the AUC.
摘要:
统计欺诈检测包括建立一个系统,自动选择所有案件的子集(保险索赔,金融交易,等。),这是最有趣的进一步调查。需要这种系统的原因是案件总数通常比实际可以手动调查的案件总数高得多,并且欺诈往往非常罕见。Further,研究者通常仅限于控制有限数量的k个病例,由于资源有限。分配这些资源的最有效方式是尝试选择具有最高欺诈概率的k个案例。通常必须对用于此目的的预测模型进行正则化,以避免过度拟合并因此避免不良的预测性能。损失函数,表示欺诈损失,提出了通过调整参数选择模型复杂度的方法。进行模拟研究以找到用于验证的最佳设置。Further,将拟议程序的性能与最相关的竞争程序进行比较,根据接受者工作特征曲线下面积(AUC),在一组模拟中,以及信用卡默认数据集。与根据AUC选择模型相比,通过欺诈损失选择模型的复杂性在欺诈损失方面产生了可比或更好的结果。
公众号