Mesh : Minerals / analysis chemistry Algorithms Portugal X-Ray Diffraction Spectrometry, X-Ray Emission / methods Humans Machine Learning Supervised Machine Learning

来  源:   DOI:10.1371/journal.pone.0302563   PDF(Pubmed)

Abstract:
Research on personal adornments depends on the reliable characterisation of materials to trace provenance and model complex social networks. However, many analytical techniques require the transfer of materials from the museum to the laboratory, involving high insurance costs and limiting the number of items that can be analysed, making the process of empirical data collection a complicated, expensive and time-consuming routine. In this study, we compiled the largest geochemical dataset of Iberian personal adornments (n = 1243 samples) by coupling X-ray fluorescence compositional data with their respective X-ray diffraction mineral labels. This allowed us to develop a machine learning-based framework for the prediction of bead-forming minerals by training and benchmarking 13 of the most widely used supervised algorithms. As a proof of concept, we developed a multiclass model and evaluated its performance on two assemblages from different Portuguese sites with current mineralogical characterisation: Cova das Lapas (n = 15 samples) and Gruta da Marmota (n = 10 samples). Our results showed that decisión-tres based classifiers outperformed other classification logics given the discriminative importance of some chemical elements in determining the mineral phase, which fits particularly well with the decision-making process of this type of model. The comparison of results between the different validation sets and the proof-of-concept has highlighted the risk of using synthetic data to handle imbalance and the main limitation of the framework: its restrictive class system. We conclude that the presented approach can successfully assist in the mineral classification workflow when specific analyses are not available, saving time and allowing a transparent and straightforward assessment of model predictions. Furthermore, we propose a workflow for the interpretation of predictions using the model outputs as compound responses enabling an uncertainty reduction approach currently used by our team. The Python-based framework is packaged in a public repository and includes all the necessary resources for its reusability without the need for any installation.
摘要:
对个人装饰品的研究取决于对材料的可靠表征,以追踪出处并对复杂的社交网络进行建模。然而,许多分析技术需要将材料从博物馆转移到实验室,涉及高保险成本和限制可以分析的项目数量,使经验数据收集过程变得复杂,昂贵和耗时的常规。在这项研究中,我们通过将X射线荧光组成数据与各自的X射线衍射矿物标签耦合,编制了伊比利亚个人装饰品的最大地球化学数据集(n=1243个样品)。这使我们能够通过训练和基准测试13种最广泛使用的监督算法来开发基于机器学习的框架,用于预测成珠矿物。作为概念的证明,我们开发了一个多类模型,并评估了来自葡萄牙不同地点的两个组合的性能,这些组合具有当前的矿物学特征:CovadasLapas(n=15个样品)和GrutadaMarmota(n=10个样品)。我们的结果表明,考虑到某些化学元素在确定矿物相时的判别重要性,基于decisión-tres的分类器优于其他分类逻辑。这与这类模型的决策过程特别吻合。不同验证集和概念证明之间的结果比较突出了使用合成数据来处理不平衡的风险以及框架的主要限制:其限制性类系统。我们得出的结论是,当没有特定的分析可用时,所提出的方法可以成功地帮助矿物分类工作流程,节省时间,并允许对模型预测进行透明和直接的评估。此外,我们提出了一种使用模型输出作为复合响应来解释预测的工作流程,从而实现了我们团队目前使用的不确定性降低方法。基于Python的框架被打包在一个公共存储库中,并且包括其可重用性的所有必要资源,而不需要任何安装。
公众号