关键词: D-Score DeepGestalt Face2Gene GestaltMatcher diagnostic accuracy facial phenotyping facial recognition genetic syndrome genetics machine learning medical genetics

Mesh : Humans Female Male Retrospective Studies Algorithms Area Under Curve Benchmarking Computers

来  源:   DOI:10.2196/42904   PDF(Pubmed)

Abstract:
BACKGROUND: While characteristic facial features provide important clues for finding the correct diagnosis in genetic syndromes, valid assessment can be challenging. The next-generation phenotyping algorithm DeepGestalt analyzes patient images and provides syndrome suggestions. GestaltMatcher matches patient images with similar facial features. The new D-Score provides a score for the degree of facial dysmorphism.
OBJECTIVE: We aimed to test state-of-the-art facial phenotyping tools by benchmarking GestaltMatcher and D-Score and comparing them to DeepGestalt.
METHODS: Using a retrospective sample of 4796 images of patients with 486 different genetic syndromes (London Medical Database, GestaltMatcher Database, and literature images) and 323 inconspicuous control images, we determined the clinical use of D-Score, GestaltMatcher, and DeepGestalt, evaluating sensitivity; specificity; accuracy; the number of supported diagnoses; and potential biases such as age, sex, and ethnicity.
RESULTS: DeepGestalt suggested 340 distinct syndromes and GestaltMatcher suggested 1128 syndromes. The top-30 sensitivity was higher for DeepGestalt (88%, SD 18%) than for GestaltMatcher (76%, SD 26%). DeepGestalt generally assigned lower scores but provided higher scores for patient images than for inconspicuous control images, thus allowing the 2 cohorts to be separated with an area under the receiver operating characteristic curve (AUROC) of 0.73. GestaltMatcher could not separate the 2 classes (AUROC 0.55). Trained for this purpose, D-Score achieved the highest discriminatory power (AUROC 0.86). D-Score\'s levels increased with the age of the depicted individuals. Male individuals yielded higher D-scores than female individuals. Ethnicity did not appear to influence D-scores.
CONCLUSIONS: If used with caution, algorithms such as D-score could help clinicians with constrained resources or limited experience in syndromology to decide whether a patient needs further genetic evaluation. Algorithms such as DeepGestalt could support diagnosing rather common genetic syndromes with facial abnormalities, whereas algorithms such as GestaltMatcher could suggest rare diagnoses that are unknown to the clinician in patients with a characteristic, dysmorphic face.
摘要:
背景:虽然特征性面部特征为遗传综合征的正确诊断提供了重要线索,有效的评估可能具有挑战性。下一代表型算法DeepGestalt分析患者图像并提供综合征建议。GestaltMatcher匹配具有相似面部特征的患者图像。新的D-Score提供了面部畸形程度的评分。
目的:我们旨在通过对GestaltMatcher和D-Score进行基准测试并将其与DeepGestalt进行比较来测试最先进的面部表型工具。
方法:使用486种不同遗传综合征患者的4796张图像的回顾性样本(伦敦医学数据库,GestaltMatcher数据库,和文献图像)和323张不显眼的对照图像,我们确定了D评分的临床应用,GestaltMatcher,和DeepGestalt,评估敏感性;特异性;准确性;支持诊断的数量;以及潜在的偏见,如年龄,性别,和种族。
结果:DeepGestalt提出了340个不同的综合征,GestaltMatcher提出了1128个综合征。深度格式塔的前30名敏感度更高(88%,SD18%)比GestaltMatcher(76%,SD26%)。DeepGestalt通常分配较低的分数,但为患者图像提供的分数高于不显眼的对照图像。从而使2个队列的受试者工作特征曲线下面积(AUROC)为0.73.GestaltMatcher无法分离这两个类(AUROC0.55)。为此目的训练过,D-Score取得了最高的鉴别力(AUROC0.86)。D-Score的水平随着所描绘个体的年龄而增加。男性个体的D得分高于女性个体。种族似乎没有影响D分数。
结论:如果谨慎使用,D-score等算法可以帮助资源有限或综合征学经验有限的临床医生决定患者是否需要进一步的遗传评估.诸如DeepGestalt之类的算法可以支持诊断具有面部异常的相当常见的遗传综合征,而诸如GestaltMatcher之类的算法可以建议临床医生对具有特征的患者进行罕见的诊断,畸形脸。
公众号