Leave-one-out

遗漏一次
  • 文章类型: Journal Article
    专有遗传数据集对于提高全基因组关联研究(GWAS)的统计能力非常有价值,但是它们的使用可能会限制调查人员公开分享最终的汇总统计数据。尽管研究人员可以诉诸共享排除受限数据的下采样版本,下采样会降低功率,并可能改变正在研究的表型的遗传病因。当使用多变量GWAS方法时,这些问题更加复杂,如基因组结构方程建模(基因组SEM),建立了多个性状的遗传相关性模型。这里,我们提出了一种系统的方法来评估包括与排除限制性数据的GWAS汇总统计数据的可比性.用外部化因子的多变量GWAS来说明这种方法,我们评估了下采样对(1)单变量GWAS中遗传信号强度的影响,(2)多元基因组SEM中的因子载荷和模型拟合,(3)因子水平上的遗传信号强弱,(4)来自基因属性分析的见解,(5)与其他性状的遗传相关模式,(6)独立样本的多基因评分分析。对于外部化GWAS,虽然下采样导致遗传信号丢失和较少的全基因组显著基因座;因子负荷和模型拟合,基因属性分析,遗传相关性,和多基因评分分析被发现是稳健的。鉴于数据共享对于推进开放科学的重要性,我们建议产生和分享下采样汇总统计数据的研究者将这些分析报告为随附文档,以支持其他研究者使用汇总统计数据.
    Proprietary genetic datasets are valuable for boosting the statistical power of genome-wide association studies (GWASs), but their use can restrict investigators from publicly sharing the resulting summary statistics. Although researchers can resort to sharing down-sampled versions that exclude restricted data, down-sampling reduces power and might change the genetic etiology of the phenotype being studied. These problems are further complicated when using multivariate GWAS methods, such as genomic structural equation modeling (Genomic SEM), that model genetic correlations across multiple traits. Here, we propose a systematic approach to assess the comparability of GWAS summary statistics that include versus exclude restricted data. Illustrating this approach with a multivariate GWAS of an externalizing factor, we assessed the impact of down-sampling on (1) the strength of the genetic signal in univariate GWASs, (2) the factor loadings and model fit in multivariate Genomic SEM, (3) the strength of the genetic signal at the factor level, (4) insights from gene-property analyses, (5) the pattern of genetic correlations with other traits, and (6) polygenic score analyses in independent samples. For the externalizing GWAS, although down-sampling resulted in a loss of genetic signal and fewer genome-wide significant loci; the factor loadings and model fit, gene-property analyses, genetic correlations, and polygenic score analyses were found robust. Given the importance of data sharing for the advancement of open science, we recommend that investigators who generate and share down-sampled summary statistics report these analyses as accompanying documentation to support other researchers\' use of the summary statistics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Cardiomyocytes differentiated from human induced pluripotent stem cells (iPSC-CMs) can be used to study genetic cardiac diseases. In patients these diseases are manifested e.g. with impaired contractility and fatal cardiac arrhythmias, and both of these can be due to abnormal calcium transients in cardiomyocytes. Here we classify different genetic cardiac diseases using Ca2+ transient data and different machine learning algorithms.
    METHODS: By studying calcium cycling of disease-specific iPSC-CMs and by using calcium transients measured from these cells it is possible to classify diseases from each other and also from healthy controls by applying machine learning computation on the basis of peak attributes detected from calcium transient signals.
    RESULTS: In the current research we extend our previous study having Ca-transient data from four different genetic diseases by adding data from two additional diseases (dilated cardiomyopathy and long QT Syndrome 2). We also study, in the light of the current data, possible differences and relations when machine learning modelling and classification accuracies were computed by using either leave-one-out test or 10-fold cross-validation.
    CONCLUSIONS: Despite more complex classification tasks compared to our earlier research and having more different genetic cardiac diseases in the analysis, it is still possible to attain good disease classification results. As excepted, leave-one-out test and 10-fold cross-validation achieved virtually equal results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    可以使用有限元分析(FEA)预测人体股骨近端的极限力,但是这些模型依赖于3D计算机断层扫描图像。基于Landmark的统计外观模型(SAM)和基于B样条变换的统计变形模型(SDM)已用于从2D投影估计3D图像。这有利于模型生成并减少辐射剂量。然而,没有关于基于SDM的骨骼FEA模型相对于实验结果的准确性的文献。在这项研究中,提出了一种具有纹理信息的增强SDM的方法。统计变形和纹理模型(SDTM)基于一组37张定量CT(QCT)图像。它们被用于在留一法设置中从集合的两个或一个投影估计3D图像。这些估计然后用于创建FEA模型。将使用SDTM从两个或一个投影估计的FEA模型预测的极限力与先前对同一股骨的研究的实验极限力以及基于标准QCT的FEA模型的结果进行比较。对于从2D投影重建的FEA模型,预测与实验测量之间存在高度相关性,当基于两个投影时,R2=0.835,当使用一个投影时,R2=0.724。相关性与基于标准QCT的FE模型与实验结果的相关性相当(R2=0.795)。这项研究显示了基于SDTM的3D图像重建和从2D投影进行FEA建模以预测股骨极限力的巨大潜力。
    Ultimate force of the proximal human femur can be predicted using Finite Element Analysis (FEA), but the models rely on 3D computed tomography images. Landmark-based statistical appearance models (SAM) and B-Spline transformation-based statistical deformation models (SDM) have been used to estimate 3D images from 2D projections, which facilitates model generation and reduces the radiation dose. However, there is no literature on the accuracy of SDM-based FEA models of bones with respect to experimental results. In this study, a methodology for an enhanced SDM with textural information is presented. The statistical deformation and texture models (SDTMs) are based on a set of 37 quantitative CT (QCT) images. They were used to estimate 3D images from two or one projections of the set in a leave-one-out setup. These estimations where then used to create FEA models. The ultimate force predicted by FEA models estimated from two or one projection using the SDTMs were compared to the experimental ultimate force from a previous study on the same femora and to the results of standard QCT-based FEA models. High correlations between predictions and experimental measurements were found for FEA models reconstructed from 2D projections with R2=0.835 when based on two projections and R2=0.724 when using one projection. The correlations were comparable to those reached with standard QCT-based FE-models with the experimental results (R2=0.795). This study shows the high potential of SDTM-based 3D image reconstruction and FEA modelling from 2D projections to predict femoral ultimate force.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    UNASSIGNED: Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given n elements g 0 , g 1 , … , g n - 1 in a set G with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products g ¯ j = g 0 g 1 ⋯ g j - 1 g j + 1 ⋯ g n - 1 ( 0 ≤ j < n ).
    UNASSIGNED: This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like g i , j = g i g i + 1 ⋯ g j - 1 ; its novel downward phase mirrors the upward phase while exploiting the symmetry of g j and its complement g ¯ j . The algorithm requires storage for 2 n elements of G and only about 3 n products. In contrast, the standard segment tree algorithms require about n products for construction and log 2 n products for calculating each g ¯ j , i.e., about n log 2 n products in total; and a naïve quadratic algorithm using n - 2 element-by-element products to compute each g ¯ j requires n n - 2 products.
    UNASSIGNED: In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于模型的认知神经科学的一个关键目标是估计认知模型参数的逐个试验波动,以便将这些波动与大脑信号联系起来。然而,以前开发的方法由于难以实现而受到限制,耗时,或特定型号。这里,我们提出了一个简单的,估计参数试验性变化的有效和通用方法:留一审(LOTO)。LOTO背后的基本原理是,完整数据集和省略试验的数据集的参数估计之间的差异反映了省略试验中的参数值。我们证明LOTO优于从单个试验中估计参数值,并将其与先前提出的方法进行比较。此外,该方法可以将参数的真实变异性与噪声和其他变异性来源区分开。在我们看来,LOTO的实用性和通用性将推进跟踪潜在认知变量波动并将其与神经数据联系起来的研究。
    A key goal of model-based cognitive neuroscience is to estimate the trial-by-trial fluctuations of cognitive model parameters in order to link these fluctuations to brain signals. However, previously developed methods are limited by being difficult to implement, time-consuming, or model-specific. Here, we propose an easy, efficient and general approach to estimating trial-wise changes in parameters: Leave-One-Trial-Out (LOTO). The rationale behind LOTO is that the difference between parameter estimates for the complete dataset and for the dataset with one omitted trial reflects the parameter value in the omitted trial. We show that LOTO is superior to estimating parameter values from single trials and compare it to previously proposed approaches. Furthermore, the method makes it possible to distinguish true variability in a parameter from noise and from other sources of variability. In our view, the practicability and generality of LOTO will advance research on tracking fluctuations in latent cognitive variables and linking them to neural data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究试图更好地了解与轻度认知障碍(MCI)和阿尔茨海默病(AD)的代谢脑网络相关的特性。应用图论研究86例MCI患者、89例AD患者的代谢脑网络拓扑结构,和97个正常对照(NC)使用18F氟脱氧葡萄糖正电子发射断层扫描(FDG-PET)数据。Brodmann图集将整个大脑分为82个区域来构建网络。我们发现MCI和AD表现出小世界特性和拓扑像差的丧失,MCI显示出NC和AD之间的中间测量值。MCI和AD的网络容易受到拓扑模式改变导致的攻击。此外,个人贡献与简易精神状态检查和临床痴呆评分相关。本研究表明MCI和AD患者代谢网络的拓扑模式异常。这可能特别有助于揭示MCI和AD认知功能障碍的病理生理学基础。
    This study attempted to better understand the properties associated with the metabolic brain network in mild cognitive impairment (MCI) and Alzheimer\'s disease (AD). Graph theory was employed to investigate the topological organization of metabolic brain network among 86 patients with MCI, 89 patients with AD, and 97 normal controls (NCs) using 18F fluoro-deoxy-glucose positron emission tomography (FDG-PET) data. The whole brain was divided into 82 areas by Brodmann atlas to construct networks. We found that MCI and AD showed a loss of small-world properties and topological aberrations, and MCI showed an intermediate measurement between NC and AD. The networks of MCI and AD were vulnerable to attacks resulting from the altered topological pattern. Furthermore, individual contributions were correlated with Mini-Mental State Examination and Clinical Dementia Rating. The present study indicated that the topological patterns of the metabolic networks were aberrant in patients with MCI and AD, which may be particularly helpful in uncovering the pathophysiology underlying the cognitive dysfunction in MCI and AD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    We propose a method for visualizing genetic assignment data by characterizing the distribution of genetic profiles for each candidate source population. This method enhances the assignment method of Rannala and Mountain (1997) by calculating appropriate graph positions for individuals for which some genetic data are missing. An individual with missing data is positioned in the distributions of genetic profiles for a population according to its estimated quantile based on its available data. The quantiles of the genetic profile distribution for each population are calculated by approximating the cumulative distribution function (CDF) using the saddlepoint method, and then inverting the CDF to get the quantile function. The saddlepoint method also provides a way to visualize assignment results calculated using the leave-one-out procedure. This new method offers an advance upon assignment software such as geneclass2, which provides no visualization method, and is biologically more interpretable than the bar charts provided by the software structure. We show results from simulated data and apply the methods to microsatellite genotype data from ship rats (Rattus rattus) captured on the Great Barrier Island archipelago, New Zealand. The visualization method makes it straightforward to detect features of population structure and to judge the discriminative power of the genetic data for assigning individuals to source populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Forty anthraquinone derivatives have been downloaded from PubChem database and investigated in a quantitative structure-activity relationships (QSAR) study. The models describing log P and LD50 of this set were built up on the hypermolecule scheme that mimics the investigated receptor space; the models were validated by the leave-one-out procedure, in the external test set and in a new version of prediction by using similarity clusters. Molecular docking approach using Lamarckian Genetic Algorithm was made on this class of anthraquinones with respect to 3Q3B receptor. The best scored molecules in the docking assay were used as leaders in the similarity clustering procedure. It is demonstrated that the LD50 data of this set of anthraquinones are related to the binding energies of anthraquinone ligands to the 3Q3B receptor.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Comparative Study
    两种QSAR方法的性能,即多元线性回归(MLR)和神经网络(NN),对抗结核活性的建模和预测进行了评估和比较。分析了属于酰肼家族并由96个描述符表示的173种潜在活性化合物的数据集。模型是用多元线性回归(MLR)构建的,单前馈神经网络(FFNN),FFNN和关联神经网络(AsNN)的集合使用四个不同的数据集和不同类型的描述符。根据不同的验证标准对所使用的不同技术的预测能力进行了评估和讨论,结果表明,与所有其他方法相比,AsNN在学习能力和预测抗结核行为方面的性能总体上更好。MLR有,然而,精确定位负责这些化合物抗结核分枝杆菌行为的最相关分子特征的优势。使用AsNN使用七个描述符(R(2)为0.874,RMSE为0.437,而R(2)为0.845,RMSE为0.472),获得了较大数据集(训练集中的94个化合物和测试集中的18个化合物)的最佳结果。MLR中的0.472,对于测试集)。用相同的数据集和描述符训练反向传播神经网络(CPNN)。根据对每个CPNN中的权重水平的审查以及从MLR中检索到的信息,我们尝试对潜在活性化合物进行合理设计.合成了两种新化合物,并对结核分枝杆菌进行了测试,显示出接近大多数模型预测的活性。
    The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    The 3-dimensional quantitative structure activity relationship (3D-QSAR) models were established from 21 anthocyanins based on their oxygen radical absorbing capacity (ORAC) and were applied to predict anthocyanins in eggplant and radish for their ORAC values. The cross-validated q(2)=0.857/0.729, non-cross-validated r(2) = 0.958/0.856, standard error of estimate = 0.153/0.134, and F = 73.267/19.247 were for the best QSAR (CoMFA/CoMSIA) models, where the correlation coefficient r(2)pred = 0.998/0.997 (>0.6) indicated a high predictive ability for each. Additionally, the contour map results suggested that structural characteristics of anthocyanins favourable for the high ORAC. Four anthocyanins from eggplant and radish have been screened based on the QSAR models. Pelargonidin-3-[(6\'\'-p-coumaroyl)-glucosyl(2 → 1)glucoside]-5-(6\'\'-malonyl)-glucoside, delphinidin-3-rutinoside-5-glucoside, and delphinidin-3-[(4\'\'-p-coumaroyl)-rhamnosyl(1 → 6)glucoside]-5-glucoside potential with high ORAC based the QSAR models were isolated and also confirmed for their relative high antioxidant ability, which might attribute to the bulky and/or electron-donating substituent at the 3-position in the C ring or/and hydrogen bond donor group/electron donating group on the R1 position in the B ring.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号