Generalizability

泛化
  • 文章类型: Journal Article
    典型的实证研究包括选择样本,一个研究设计,和分析路径。研究中此类选择的差异导致结果的异质性,从而引入了额外的不确定性层,限制发表的科学发现的普遍性。我们提供了一个研究社会科学中异质性的框架,并将异质性划分为人口,设计,和分析异质性。我们的框架表明,在考虑到异质性之后,被检验的假设对于平均人口是正确的概率,设计,和分析路径可能远低于统计上显著的个别研究的名义错误率所暗示的。我们从70个多实验室复制研究中估计了每种类型的异质性,采用不同实验设计的研究的11项前瞻性荟萃分析,和5项多重分析研究。在我们的数据中,种群异质性往往相对较小,而设计和分析的异质性很大。我们的结果应该是,然而,由于研究数量有限,异质性估计存在很大的不确定性,因此应谨慎解释。我们讨论了在不同方法的背景下解析和解释异质性的几种方法。
    A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty, limiting the generalizability of published scientific findings. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population, design, and analytical heterogeneity. Our framework suggests that after accounting for heterogeneity, the probability that the tested hypothesis is true for the average population, design, and analysis path can be much lower than implied by nominal error rates of statistically significant individual studies. We estimate each type\'s heterogeneity from 70 multilab replication studies, 11 prospective meta-analyses of studies employing different experimental designs, and 5 multianalyst studies. In our data, population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. Our results should, however, be interpreted cautiously due to the limited number of studies and the large uncertainty in the heterogeneity estimates. We discuss several ways to parse and account for heterogeneity in the context of different methodologies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    揭示anoikis抗性在CRC中的作用对于CRC的诊断和治疗具有重要意义。本研究整合了CRC失巢凋亡相关关键基因(CRC-AKGs),建立了一种新的模型,以提高CRC预后评估的效率和准确性。
    通过差异表达和单变量Cox分析筛选出CRC-ARGs。通过LASSO机器学习算法获得CRC-AKGs,构建LASSO风险评分,结合临床预测因子构建列线图临床预测模型。并行,这项工作开发了一个基于网络的动态列线图,以促进我们模型的推广和实际应用。
    我们确定了10个CRC-AKGs,并计算了与风险相关的预后风险评分。多因素COX回归分析表明,风险评分,TNM阶段,年龄和年龄是与CRC预后显著相关的独立危险因素(p<0.05)。建立预后模型以令人满意的准确性(3年AUC=0.815)预测CRC个体的结果。网络交互式列线图(https://yuexiazhang.shinyapps.io/anosikisCRC/)显示出我们模型的强泛化性。并行,在目前的工作中发现了肿瘤微环境与风险评分之间的实质性相关性.
    这项研究揭示了anoikis在CRC中的潜在作用,并基于临床和测序数据为大肠癌的临床决策提供了新的见解。此外,交互式工具为研究人员提供了一个用户友好的界面,以输入相关临床变量,并根据我们建立的模型获得个性化的风险预测或预后评估.
    UNASSIGNED: Revealing the role of anoikis resistance plays in CRC is significant for CRC diagnosis and treatment. This study integrated the CRC anoikis-related key genes (CRC-AKGs) and established a novel model for improving the efficiency and accuracy of the prognostic evaluation of CRC.
    UNASSIGNED: CRC-ARGs were screened out by performing differential expression and univariate Cox analysis. CRC-AKGs were obtained through the LASSO machine learning algorithm and the LASSO Risk-Score was constructed to build a nomogram clinical prediction model combined with the clinical predictors. In parallel, this work developed a web-based dynamic nomogram to facilitate the generalization and practical application of our model.
    UNASSIGNED: We identified 10 CRC-AKGs and a risk-related prognostic Risk-Score was calculated. Multivariate COX regression analysis indicated that the Risk-Score, TNM stage, and age were independent risk factors that significantly associated with the CRC prognosis(p < 0.05). A prognostic model was built to predict the outcome with satisfied accuracy (3-year AUC = 0.815) for CRC individuals. The web interactive nomogram (https://yuexiaozhang.shinyapps.io/anoikisCRC/) showed strong generalizability of our model. In parallel, a substantial correlation between tumor microenvironment and Risk-Score was discovered in the present work.
    UNASSIGNED: This study reveals the potential role of anoikis in CRC and sets new insights into clinical decision-making in colorectal cancer based on both clinical and sequencing data. Also, the interactive tool provides researchers with a user-friendly interface to input relevant clinical variables and obtain personalized risk predictions or prognostic assessments based on our established model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    将社会经济地位(SES)作为自变量进行测量具有挑战性,尤其是在流行病学和社会研究中。这个问题在国家一级的大规模研究中更为关键。本研究旨在广泛评估伊朗SES问卷的有效性和可靠性。
    这种心理测量学,对3000户家庭进行了横断面研究,通过随机整群抽样从东阿塞拜疆省和德黑兰的不同地区选出,伊朗。此外,来自大不里士医科大学的250名学生被选为采访员,从伊朗40个地区收集数据。使用探索性和验证性因素分析以及Cronbachα评估SES问卷的结构效度和内部一致性。数据分析采用SPSS和AMOS。
    完整的伊朗版本的SES问卷由5个因素组成。Cronbach的α值计算为0.79、0.94、0.66、0.69和0.48,经济能力的自我评估,房子和家具,财富,和卫生支出,分别。此外,验证性因素分析结果表明数据与5因素模型(比较拟合指数=0.96;拟合优度指数=0.95;增量拟合指数=0.96;近似均方根误差=0.05)的相容性。
    根据结果,该工具的确证的有效性和可靠性表明,伊朗版本的SES问卷可以广泛使用相同的结构,并且可以适用于更广泛人群的SES测量.
    UNASSIGNED: Measuring socioeconomic status (SES) as an independent variable is challenging, especially in epidemiological and social studies. This issue is more critical in large-scale studies on the national level. The present study aimed to extensively evaluate the validity and reliability of the Iranian SES questionnaire.
    UNASSIGNED: This psychometric, cross-sectional study was conducted on 3000 households, selected via random cluster sampling from various areas in East Azerbaijan province and Tehran, Iran. Moreover, 250 students from Tabriz University of Medical Sciences were selected as interviewers to collect data from 40 districts in Iran. The construct validity and internal consistency of the SES questionnaire were assessed using exploratory and confirmatory factor analyses and the Cronbach\'s alpha. Data analysis was performed in SPSS and AMOS.
    UNASSIGNED: The complete Iranian version of the SES questionnaire consists of 5 factors. The Cronbach\'s alpha was calculated to be 0.79, 0.94, 0.66, 0.69, and 0.48 for the occupation, self-evaluation of economic capacity, house and furniture, wealth, and health expenditure, respectively. In addition, the confirmatory factor analysis results indicated the data\'s compatibility with the 5-factor model (comparative fit index = 0.96; goodness of fit index = 0.95; incremental fit index = 0.96; root mean square error of approximation = 0.05).
    UNASSIGNED: According to the results, the confirmed validity and reliability of the tool indicated that the Iranian version of the SES questionnaire could be utilized with the same structure on an extensive level and could be applicable for measuring the SES in a broader range of populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    总结最近关于差异研究中选择偏见的文献,解决描述性或因果关系问题,痴呆症研究的例子。
    定义一个明确的估计,包括目标人群,对于评估泛化偏差或对撞机分层偏差是否对推论构成威胁至关重要。差异研究中的选择偏差可能来自抽样策略,微分夹杂物管道,后续损失,和竞争事件。如果发生竞争事件,可以在不同的假设下估计几个潜在相关的估计,不同的解释。视差的表观幅度可以基于所选择的估计和而实质上不同。如果不是基于已知的抽样方案,随机和观察性研究都可能歪曲健康差异或治疗效果的异质性。
    研究人员最近在与选择偏差相关的概念化和方法方面取得了实质性进展。这一进展将提高描述性和因果健康差异研究的相关性。
    UNASSIGNED: To summarize recent literature on selection bias in disparities research addressing either descriptive or causal questions, with examples from dementia research.
    UNASSIGNED: Defining a clear estimand, including the target population, is essential to assess whether generalizability bias or collider-stratification bias are threats to inferences. Selection bias in disparities research can result from sampling strategies, differential inclusion pipelines, loss to follow-up, and competing events. If competing events occur, several potentially relevant estimands can be estimated under different assumptions, with different interpretations. The apparent magnitude of a disparity can differ substantially based on the chosen estimand. Both randomized and observational studies may misrepresent health disparities or heterogeneity in treatment effects if they are not based on a known sampling scheme.
    UNASSIGNED: Researchers have recently made substantial progress in conceptualization and methods related to selection bias. This progress will improve the relevance of both descriptive and causal health disparities research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    评估减肥手术治疗对糖尿病控制的随机对照试验(RCT)的外部有效性。
    多位点随机对照试验提供了最有力的证据支持临床治疗,并具有最大的内部有效性。然而,试验参与者的特征可能不能代表现实世界中接受治疗的患者.需要评估RCT的结果如何推广到正在接受治疗的所有当代患者群体。
    2018年1月8日至2023年5月19日在加州大学洛杉矶分校(UCLA)接受袖状胃切除术的所有患者均具有基线特征,体重变化,与参加手术治疗和药物可能有效根除糖尿病(STAMPEDE)和糖尿病手术研究(DSS)减重手术对糖尿病控制的影响的RCTs相比。比较了符合和不符合这些随机对照试验进入标准的UCLA患者的体重减轻和糖尿病控制。
    387例糖尿病患者中只有65例(17%)符合STAMPEDE的资格标准,29人(7.5%)因年龄较大而符合DSS标准,具有较高的体重指数,降低HbA1c。UCLA患者的体重减轻比RCT患者略少,但糖尿病控制相似。313名(81%)不符合进入任一RCT研究条件的患者与符合RCT条件的患者具有相似的长期糖尿病控制。
    尽管接受减肥手术的患者中只有很小一部分符合两项主要随机对照试验的资格标准,这一当代队列中的大多数患者具有相似的结局.来自STAMPEDE和DSS的糖尿病结果普遍适用于大多数接受减肥手术以控制糖尿病的患者。
    UNASSIGNED: To assess the external validity of randomized controlled trials (RCTs) of bariatric surgical treatment on diabetes control.
    UNASSIGNED: Multisite RCTs provide the strongest evidence supporting clinical treatments and have the greatest internal validity. However, characteristics of trial participants may not be representative of patients receiving treatment in the real world. There is a need to assess how the results of RCTs generalize to all contemporary patient populations undergoing treatments.
    UNASSIGNED: All patients undergoing sleeve gastrectomy at University of California Los Angeles (UCLA) between January 8, 2018 and May 19, 2023 had their baseline characteristics, weight change, and diabetes control compared with those enrolled in the surgical treatment and medications potentially eradicate diabetes efficiently (STAMPEDE) and diabetes surgery study (DSS) RCTs of bariatric surgery\'s effect on diabetes control. Weight loss and diabetes control were compared between UCLA patients who did and did not fit the entry criteria for these RCTs.
    UNASSIGNED: Only 65 (17%) of 387 patients with diabetes fulfilled the eligibility criteria for STAMPEDE, and 29 (7.5%) fulfilled the criteria for DSS due to being older, having higher body mass index, and lower HbA1c. UCLA patients experienced slightly less weight loss than patients in the RCTs but had similar diabetes control. The 313 (81%) patients not eligible for study entry into either RCT had similar long-term diabetes control as those who were eligible for the RCTs.
    UNASSIGNED: Even though only a very small proportion of patients undergoing bariatric surgery met the eligibility criteria for the 2 major RCTs, most patients in this contemporary cohort had similar outcomes. Diabetes outcomes from STAMPEDE and DSS generalize to most patients undergoing bariatric surgery for diabetes control.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    管道和密闭容器中的散波声波飞行时间(ToF)测量可能会受到导波的阻碍,导波在容器壁中传播的到达时间相似,特别是当使用低激励频率来减轻材料的声音衰减时。卷积神经网络(CNN)已成为在无损评估(NDE)中获得准确ToF的新范例,并已在如此复杂的条件下得到证明。然而,ToF-CNN的普遍性尚未得到研究。在这项工作中,为了更广泛的应用,我们分析了ToF-CNN的泛化性,给定有限的训练数据。我们首先研究关于训练数据集大小和不同训练数据和测试数据参数(容器尺寸和材料属性)的CNN性能。此外,我们进行了一系列测试,以了解数据参数的分布,这些参数需要纳入训练中以增强模型的泛化性。这是通过在一组小型和大型容器数据集上训练模型来调查的,而不管测试数据如何。我们观察到,为训练而划分的数据量必须是整个集合的良好表示,并且足以跨越输入空间。网络的结果还表明,与在大型容器上使用训练数据的学习模型相比,在小型容器上使用训练数据的学习模型在不同特征交互上提供了足够稳定的结果。为了检查模型的鲁棒性,我们测试了训练模型来预测不同声速介质的ToF,这显示了极好的准确性。此外,模仿真实的实验场景,通过添加噪声来增强数据。我们设想所提出的方法将在更广泛的范围内扩展CNN在ToF预测中的应用。
    Bulk wave acoustic time-of-flight (ToF) measurements in pipes and closed containers can be hindered by guided waves with similar arrival times propagating in the container wall, especially when a low excitation frequency is used to mitigate sound attenuation from the material. Convolutional neural networks (CNNs) have emerged as a new paradigm for obtaining accurate ToF in non-destructive evaluation (NDE) and have been demonstrated for such complicated conditions. However, the generalizability of ToF-CNNs has not been investigated. In this work, we analyze the generalizability of the ToF-CNN for broader applications, given limited training data. We first investigate the CNN performance with respect to training dataset size and different training data and test data parameters (container dimensions and material properties). Furthermore, we perform a series of tests to understand the distribution of data parameters that need to be incorporated in training for enhanced model generalizability. This is investigated by training the model on a set of small- and large-container datasets regardless of the test data. We observe that the quantity of data partitioned for training must be of a good representation of the entire sets and sufficient to span through the input space. The result of the network also shows that the learning model with the training data on small containers delivers a sufficiently stable result on different feature interactions compared to the learning model with the training data on large containers. To check the robustness of the model, we tested the trained model to predict the ToF of different sound speed mediums, which shows excellent accuracy. Furthermore, to mimic real experimental scenarios, data are augmented by adding noise. We envision that the proposed approach will extend the applications of CNNs for ToF prediction in a broader range.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    用于医学图像分类的深度学习(DL)模型经常难以推广到来自外部机构的数据。也很少收集其他临床数据来全面评估和了解亚组之间的模型性能。在建立单中心模型以识别肺部超声(LUS)上的肺部滑动伪影之后,我们采用外部LUS数据进行验证策略。由于与其他医学成像数据相比,注释的LUS数据相对较少,因此我们采用了一种新颖的技术来优化有限外部数据的使用,以提高模型的可泛化性。从三个三级护理中心外部获取的LUS数据,来自238名患者的641个夹子,用于评估我们的肺滑动模型的基线泛化性。然后,我们采用了新颖的阈值感知累积微调(TAAFT)方法来微调基线模型,并确定实现预定义性能目标所需的最小数据量。还进行了亚组分析,并检查了Grad-CAM++解释。最终模型在三分之一的外部数据集上进行了微调,以达到0.917的灵敏度,0.817特异性,以及外部验证数据集上的接受者操作员特征曲线(AUC)下的0.920面积,超过我们预定的性能目标。亚组分析确定了LUS特征,这些特征对模型的性能提出了最大的挑战。Grad-CAM++显著性图突出显示M模式图像上的临床相关区域。我们报告了一项多中心研究,该研究利用有限的可用外部数据来改善我们的肺滑动模型的可泛化性和性能,同时识别表现不佳的亚组,以告知未来的迭代改进。这种方法可能有助于使用较少量的外部验证数据的DL研究人员的效率。
    Deep learning (DL) models for medical image classification frequently struggle to generalize to data from outside institutions. Additional clinical data are also rarely collected to comprehensively assess and understand model performance amongst subgroups. Following the development of a single-center model to identify the lung sliding artifact on lung ultrasound (LUS), we pursued a validation strategy using external LUS data. As annotated LUS data are relatively scarce-compared to other medical imaging data-we adopted a novel technique to optimize the use of limited external data to improve model generalizability. Externally acquired LUS data from three tertiary care centers, totaling 641 clips from 238 patients, were used to assess the baseline generalizability of our lung sliding model. We then employed our novel Threshold-Aware Accumulative Fine-Tuning (TAAFT) method to fine-tune the baseline model and determine the minimum amount of data required to achieve predefined performance goals. A subgroup analysis was also performed and Grad-CAM++ explanations were examined. The final model was fine-tuned on one-third of the external dataset to achieve 0.917 sensitivity, 0.817 specificity, and 0.920 area under the receiver operator characteristic curve (AUC) on the external validation dataset, exceeding our predefined performance goals. Subgroup analyses identified LUS characteristics that most greatly challenged the model\'s performance. Grad-CAM++ saliency maps highlighted clinically relevant regions on M-mode images. We report a multicenter study that exploits limited available external data to improve the generalizability and performance of our lung sliding model while identifying poorly performing subgroups to inform future iterative improvements. This approach may contribute to efficiencies for DL researchers working with smaller quantities of external validation data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在以前的一篇论文(Lundh,2023),有人认为,心理科学可以被视为有三个主要分支,对应于三个层面的研究:人层面的研究,在人口层面,在机制层面。本文的目的是讨论Lamiell(2024)和Nilsson(2024)对该模型提出的批评,并更详细地阐述和指定三分支模型。这是通过将尼尔森的人敏感性概念纳入模型来实现的,并通过对所涉及的两种对比进行更清晰的区分:(1)方法论上的重点是个人或个人群体;(2)理论上的重点是整个人的功能或亚个人机制。
    In a previous paper (Lundh, 2023), it was argued that psychological science can be seen as having three main branches, corresponding to three levels of research: research at the person level, at the population level, and at the mechanism level. The purpose of the present paper is to discuss the critique that has been raised against this model by Lamiell (2024) and Nilsson (2024) and to elaborate and specify the three-branch model in more detail. This is done by an incorporation of Nilsson\'s concept of person-sensitivity into the model, and by a clearer differentiation between the two contrasts involved: (1) the methodological focus either on individual persons or on populations of individuals; and (2) the theoretical focus either on whole-person functioning or on sub-personal mechanisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:随机临床试验(RCT)是评估治疗有效性的金标准;然而,他们因普遍性问题而受到批评,例如试验参与者在临床实践中代表接受治疗者的程度如何.我们评估了美国八个RCT慢性脊柱疼痛患者的代表性,这些数据用于对脊柱疼痛进行脊柱操作的成本效益的个体参与者数据荟萃分析。在这些临床试验中,脊柱操作由脊医进行。
    方法:我们对RCT数据进行了回顾性二次分析,以比较试验参与者的社会人口统计学特征,临床特征,和健康结果的代表性样本(a)美国成年人患有慢性脊柱疼痛和(b)美国成年人患有慢性脊柱疼痛接受脊椎治疗,使用来自国家健康访谈调查(NHIS)和医疗支出小组调查(MEPS)的二级数据。我们使用独立的均值t检验和比例z检验来评估试验和美国脊柱人群之间的差异。考虑到NHIS和MEPS复杂的多阶段调查设计。
    结果:我们发现临床试验中,来自健康差异人群的个体代表性不足,种族和少数民族比例较低(黑人/非洲裔美国人低7%,西班牙裔低8%),受教育程度较低(高中学历不低19%,高中学位低11%),与患有脊柱疼痛的美国人群相比,健康结果较差(使用SF-12/36,身体健康评分低2.5,心理健康评分低5.3)的失业成年人(低25%)。虽然在美国,来自健康差异人群的个体使用脊椎按摩疗法的几率较低,相对于就诊于脊医的患有慢性脊柱疼痛的美国成年人,这些试验的代表性也不足.
    结论:脊柱疼痛临床试验中没有很好地代表健康差异人群。采用基于社区的关键方法,这表明有希望增加服务不足的社区的参与,是需要的。
    BACKGROUND: Randomized clinical trials (RCTs) are the gold standard for assessing treatment effectiveness; however, they have been criticized for generalizability issues such as how well trial participants represent those who receive the treatments in clinical practice. We assessed the representativeness of participants from eight RCTs for chronic spine pain in the U.S., which were used for an individual participant data meta-analysis on the cost-effectiveness of spinal manipulation for spine pain. In these clinical trials, spinal manipulation was performed by chiropractors.
    METHODS: We conducted a retrospective secondary analysis of RCT data to compare trial participants\' socio-demographic characteristics, clinical features, and health outcomes to a representative sample of (a) U.S. adults with chronic spine pain and (b) U.S. adults with chronic spine pain receiving chiropractic care, using secondary data from the National Health Interview Survey (NHIS) and Medical Expenditure Panel Survey (MEPS). We assessed differences between trial and U.S. spine populations using independent t-tests for means and z-tests for proportions, accounting for the complex multi-stage survey design of the NHIS and MEPS.
    RESULTS: We found the clinical trials had an under-representation of individuals from health disparity populations with lower percentages of racial and ethnic minority groups (Black/African American 7% lower, Hispanic 8% lower), less educated (No high school degree 19% lower, high school degree 11% lower), and unemployed adults (25% lower) with worse health outcomes (physical health scores 2.5 lower and mental health scores 5.3 lower using the SF-12/36) relative to the U.S. population with spine pain. While the odds of chiropractic use in the U.S. are lower for individuals from health disparity populations, the trials also under-represented these populations relative to U.S. adults with chronic spine pain who visit a chiropractor.
    CONCLUSIONS: Health disparity populations are not well represented in spine pain clinical trials. Embracing key community-based approaches, which have shown promise for increasing participation of underserved communities, is needed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号