背景:参考间隔(RI)在临床决策中起重要作用。然而,由于时间,劳动,以及使用直接手段建立RIs所涉及的财务成本,使用间接方法,基于以前从临床实验室获得的大数据,越来越多的关注。不同的间接技术结合不同的数据转换方法和离群值去除可能会导致RI的计算差异。然而,对此几乎没有系统的评估。
目的:本研究使用直接方法得出的数据作为参考标准,并评估了不同数据转换组合的准确性,离群值去除,以及为大规模数据建立全血细胞计数(CBC)RI的间接技术。
方法:检索中国医科大学附属第一医院2010年1月至2011年12月18岁以上体检人群的CBC数据。在排除重复的个体后,我们进行了参数化,非参数,霍夫曼,巴塔查里亚,以及截断点和Kolmogorov-Smirnov距离(kosmic)间接方法,结合log或BoxCox转换,还有Reed-Dixon,Tukey,和迭代均值(3SD)离群值去除方法,以得出8个CBC参数的RI,并将结果与直接和先前建立的结果进行比较。此外,计算偏倚比(BRs)以评估哪种间接技术的组合,数据转换模式,和离群值去除方法是优选的。
结果:原始数据显示,白细胞(WBC)计数的偏度,血小板(PLT)计数,平均红细胞血红蛋白(MCH),平均红细胞血红蛋白浓度(MCHC),平均红细胞体积(MCV)比其他CBC参数明显。经过对数或BoxCox变换,结合Tukey或迭代均值(3SD)处理,这些数据的分布类型接近高斯分布。基于Tukey的异常值去除产生了最大数量的异常值。白细胞的下限偏倚(男性),PLT(男性),血红蛋白(HGB;男性),MCH(男/女)在30种间接方法中,MCV(女性)大于相应的上限。男性和女性CBC参数的计算间接选择不一致。通过直接方法为女性建立的MCHC的RI是狭窄的。为此,kosmic方法明显优越,这与男性高|BR|合格率的CBC参数的RI计算形成对比。在WBC计数的十大方法中,PLT计数,HGB,MCV,MCHC在男性中具有较高的BR合格率,Bhattacharya,霍夫曼,参数方法优于其他2种间接方法。
结论:与直接法得出的结果相比,离群值去除方法和间接技术显著影响最终RI,而数据转换的影响可以忽略不计,除了明显偏斜的数据。具体来说,Tukey和迭代均值(3SD)方法的异常值去除效率几乎相等。此外,间接技术的选择更多地取决于所研究分析物本身的特性。这项研究为临床实验室使用其先前的数据集建立RI提供了科学证据。
Reference intervals (RIs) play an important role in clinical decision-making. However, due to the time, labor, and financial costs involved in establishing RIs using direct means, the use of indirect methods, based on big data previously obtained from clinical laboratories, is getting increasing attention. Different indirect techniques combined with different data transformation methods and outlier removal might cause differences in the calculation of RIs. However, there are few systematic evaluations of this.
This study used data derived from direct methods as reference standards and evaluated the accuracy of combinations of different data transformation, outlier removal, and indirect techniques in establishing complete blood count (CBC) RIs for large-scale data.
The CBC data of populations aged ≥18 years undergoing physical examination from January 2010 to December 2011 were retrieved from the First Affiliated Hospital of
China Medical University in northern
China. After exclusion of repeated individuals, we performed parametric, nonparametric, Hoffmann, Bhattacharya, and truncation points and Kolmogorov-Smirnov distance (kosmic) indirect methods, combined with log or BoxCox transformation, and Reed-Dixon, Tukey, and iterative mean (3SD) outlier removal methods in order to derive the RIs of 8 CBC parameters and compared the results with those directly and previously established. Furthermore, bias ratios (BRs) were calculated to assess which combination of indirect technique, data transformation pattern, and outlier removal method is preferrable.
Raw data showed that the degrees of skewness of the white blood cell (WBC) count, platelet (PLT) count, mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), and mean corpuscular volume (MCV) were much more obvious than those of other CBC parameters. After log or BoxCox transformation combined with Tukey or iterative mean (3SD) processing, the distribution types of these data were close to Gaussian distribution. Tukey-based outlier removal yielded the maximum number of outliers. The lower-limit bias of WBC (male), PLT (male), hemoglobin (HGB; male), MCH (male/female), and MCV (female) was greater than that of the corresponding upper limit for more than half of 30 indirect methods. Computational indirect choices of CBC parameters for males and females were inconsistent. The RIs of MCHC established by the direct method for females were narrow. For this, the kosmic method was markedly superior, which contrasted with the RI calculation of CBC parameters with high |BR| qualification rates for males. Among the top 10 methodologies for the WBC count, PLT count, HGB, MCV, and MCHC with a high-BR qualification rate among males, the Bhattacharya, Hoffmann, and parametric methods were superior to the other 2 indirect methods.
Compared to results derived by the direct method, outlier removal methods and indirect techniques markedly influence the final RIs, whereas data transformation has negligible effects, except for obviously skewed data. Specifically, the outlier removal efficiency of Tukey and iterative mean (3SD) methods is almost equivalent. Furthermore, the choice of indirect techniques depends more on the characteristics of the studied analyte itself. This study provides scientific evidence for clinical laboratories to use their previous data sets to establish RIs.