关键词: imputation missing data random forest shield tunnel soil pressure

来  源:   DOI:10.3390/s24051560   PDF(Pubmed)

Abstract:
With the advancement of engineering techniques, underground shield tunneling projects have also started incorporating emerging technologies to monitor the forces and displacements during the construction and operation phases of shield tunnels. Monitoring devices installed on the tunnel segment components generate a large amount of data. However, due to various factors, data may be missing. Hence, the completion of the incomplete data is imperative to ensure the utmost safety of the engineering project. In this research, a missing data imputation technique utilizing Random Forest (RF) is introduced. The optimal combination of the number of decision trees, maximum depth, and number of features in the RF is determined by minimizing the Mean Squared Error (MSE). Subsequently, complete soil pressure data are artificially manipulated to create incomplete datasets with missing rates of 20%, 40%, and 60%. A comparative analysis of the imputation results using three methods-median, mean, and RF-reveals that this proposed method has the smallest imputation error. As the missing rate increases, the mean squared error of the Random Forest method and the other two methods also increases, with a maximum difference of about 70%. This indicates that the random forest method is suitable for imputing monitoring data.
摘要:
随着工程技术的进步,地下盾构隧道工程也已开始纳入新兴技术,以监测盾构隧道施工和运营阶段的力和位移。安装在隧道段组件上的监控设备会产生大量数据。然而,由于各种因素,数据可能丢失。因此,为了确保工程项目的最大安全,必须完成不完整的数据。在这项研究中,介绍了一种利用随机森林(RF)的缺失数据填补技术。决策树数量的最优组合,最大深度,通过最小化均方误差(MSE)来确定RF中的特征的数量。随后,人工操纵完整的土壤压力数据,以创建缺失率为20%的不完整数据集,40%,和60%。使用三种方法对填补结果进行比较分析-中位数,意思是,和RF-揭示了该方法具有最小的填补误差。随着失踪率的增加,随机森林方法和其他两种方法的均方误差也增加了,最大差异约为70%。这表明随机森林方法适用于监测数据的估算。
公众号