adaptive sampling

自适应采样
  • 文章类型: Journal Article
    对于高精度、高效率地预测先进封装中的焊点疲劳寿命一直存在较高的兴趣。随着计算设施的不断发展,人工智能加(AI+)正变得越来越受欢迎。这项研究将引入机器学习(AI的核心组成部分)。有了机器学习,创建近似系统或函数属性的元模型来预测高级包装的疲劳寿命。然而,预测能力高度依赖于训练数据的大小和分布。增加训练数据量是提高预测性能最直观的方法,但这意味着更高的计算成本。在这项研究中,采用自适应采样方法,利用现有数据库中的小数据集构建机器学习模型。模型的性能将使用预定义的标准进行可视化。此外,集成学习可用于在完全训练后提高AI模型的性能。
    There has always been high interest in predicting the solder joint fatigue life in advanced packaging with high accuracy and efficiency. Artificial Intelligence Plus (AI+) is becoming increasingly popular as computational facilities continue to develop. This study will introduce machine learning (a core component of AI). With machine learning, metamodels that approximate the attributes of systems or functions are created to predict the fatigue life of advanced packaging. However, the prediction ability is highly dependent on the size and distribution of the training data. Increasing the amount of training data is the most intuitive approach to improve prediction performance, but this implies a higher computational cost. In this research, the adaptive sampling methods are applied to build the machine learning model with a small dataset sampled from an existing database. The performance of the model will be visualized using predefined criteria. Moreover, ensemble learning can be used to improve the performance of AI models after they have been fully trained.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Case Reports
    粘液样脂肪肉瘤(MLPS)是一种罕见的肉瘤,通常在生命的第四到第五个十年中出现在深层软组织中。组织学上,MLPS由粘液样基质和鸡丝毛细血管背景下的均匀卵圆形细胞组成。基因上,MLPS的特征是FUS/EWSR1::DDIT3融合基因,通常由平衡的染色体间易位产生,可通过DDIT3断裂荧光原位杂交(FISH)检测到。这里,我们报告了一个不寻常的关节内MLPS病例,DDIT3分解FISH阴性,EWSR1::DDIT3阳性。一名18岁的女性被转诊到我们医院,抱怨右膝关节有关节内肿块。组织学上,肿瘤主要由成熟脂肪细胞组成,棕色脂肪样细胞,和脂肪母细胞。纳米孔测序检测到EWSR1和DDIT3之间的DNA重排和涉及多个染色体的簇状复杂重排,提示染色体。使用随机森林的甲基化分类,t分布随机邻居嵌入,和无监督分层聚类正确地将肿瘤分类为MLPS。拷贝数几乎持平。还检测到TERT启动子C-124T。这份报告强调,第一次,快速和低成本的纳米孔测序仪诊断肉瘤的潜在价值。
    Myxoid liposarcoma (MLPS) is a rare sarcoma, typically arising in deep soft tissues during the fourth to fifth decades of life. Histologically, MLPS is composed of uniform oval cells within a background of myxoid stroma and chicken-wire capillaries. Genetically, MLPS is characterized by the FUS/EWSR1::DDIT3 fusion gene, which generally results from balanced interchromosomal translocation and is detectable via DDIT3 break-apart fluorescence in situ hybridization (FISH). Here, we report an unusual intra-articular MLPS case, negative for DDIT3 break-apart FISH but positive for EWSR1::DDIT3. An 18-year-old female was referred to our hospital complaining of an intra-articular mass in the right knee joint. Histologically, the tumor was mainly composed of mature adipocytes, brown fat-like cells, and lipoblasts. Nanopore sequencing detected DNA rearrangements between EWSR1 and DDIT3 and clustered complex rearrangements involving multiple chromosomes, suggesting chromoplexy. Methylation classification using random forest, t-distributed stochastic neighbor embedding, and unsupervised hierarchical clustering correctly classified the tumor as MLPS. The copy number was almost flat. The TERT promoter C-124T was also detected. This report highlights, for the first time, the potential value of a fast and low-cost nanopore sequencer for diagnosing sarcomas.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    压缩感知(CS)因其在压缩信号方面的熟练程度而被认可,使其成为传感器数据采集领域的关键技术。随着物联网(IoT)系统中图像数据的激增,期望CS降低由各种传感器设备捕获的信号的传输成本。然而,随着采样率的降低,CS重建信号的质量不可避免地下降,这对下游计算机视觉(CV)任务的推理准确性提出了挑战。这种限制给现有CS技术的实际应用带来了障碍,特别是在传感器丰富的环境中降低传输成本。为了应对这一挑战,本文为传感技术领域贡献了一个基于显著性检测的面向CV的自适应CS框架,使传感器系统能够智能地确定优先级并传输最相关的数据。与现有的CS技术不同,该提案优先考虑了用于CV目的的重建图像的准确性,不仅是视觉质量。该提议的主要目标是增强对CV任务至关重要的信息的保存,同时优化传感器数据的利用。这项工作对真实传感器设备收集的各种真实场景数据集进行了实验。实验结果表明,与STL10、Intel、和用于分类的Imagenette数据集和用于对象检测的KITTI。与基线均匀采样技术相比,平均分类精度最大提高了26.23%,11.69%,18.25%,分别,以特定的采样率。此外,即使在非常低的采样率下,与最先进的CS技术相比,该提案在分类和检测方面具有鲁棒性。这可确保保留CV任务的基本信息,提高基于传感器的数据采集系统的效率。
    Compressive sensing (CS) is recognized for its adeptness at compressing signals, making it a pivotal technology in the context of sensor data acquisition. With the proliferation of image data in Internet of Things (IoT) systems, CS is expected to reduce the transmission cost of signals captured by various sensor devices. However, the quality of CS-reconstructed signals inevitably degrades as the sampling rate decreases, which poses a challenge in terms of the inference accuracy in downstream computer vision (CV) tasks. This limitation imposes an obstacle to the real-world application of existing CS techniques, especially for reducing transmission costs in sensor-rich environments. In response to this challenge, this paper contributes a CV-oriented adaptive CS framework based on saliency detection to the field of sensing technology that enables sensor systems to intelligently prioritize and transmit the most relevant data. Unlike existing CS techniques, the proposal prioritizes the accuracy of reconstructed images for CV purposes, not only for visual quality. The primary objective of this proposal is to enhance the preservation of information critical for CV tasks while optimizing the utilization of sensor data. This work conducts experiments on various realistic scenario datasets collected by real sensor devices. Experimental results demonstrate superior performance compared to existing CS sampling techniques across the STL10, Intel, and Imagenette datasets for classification and KITTI for object detection. Compared with the baseline uniform sampling technique, the average classification accuracy shows a maximum improvement of 26.23%, 11.69%, and 18.25%, respectively, at specific sampling rates. In addition, even at very low sampling rates, the proposal is demonstrated to be robust in terms of classification and detection as compared to state-of-the-art CS techniques. This ensures essential information for CV tasks is retained, improving the efficacy of sensor-based data acquisition systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    估计极其罕见的种群分布,隐藏,在地理上聚集,很难进入是一个众所周知的挑战。传统的抽样方法倾向于高估方差,即使它应该真正减少。在这种环境下,自适应聚类抽样被认为是最有效的抽样技术,因为它通常提供比其他常规概率抽样设计更低的方差,用于评估稀有和地理上收集的人口参数,如均值,total,方差,等。利用调查变量和辅助数据之间的相关性来获得估计器的精确估计,使用辅助数据非常普遍。在这篇文章中,我们引入了一个广义估计器,用于估计稀有种群的方差,隐藏,地理上聚集和难以到达。所提出的估计器通过自适应聚类采样来利用实际和转换后的辅助数据。使用泰勒展开导出了所提出的估计器的近似偏差和均方误差的表达式,直至一阶近似。还使用与辅助变量相关联的已知参数获得了一些特殊情况。使用模拟和实际数据应用程序将建议的估计器类与可用的估计器进行比较。
    Estimating dispersion in populations that are extremely rare, hidden, geographically clustered, and hard to access is a well-known challenge. Conventional sampling approaches tend to overestimate the variance, even though it should be genuinely reduced. In this environment, adaptive cluster sampling is considered to be the most efficient sampling technique as it provides generally a lower variance than the other conventional probability sampling designs for the assessment of rare and geographically gathered population parameters like mean, total, variance, etc. The use of auxiliary data is very common to obtain the precise estimates of the estimators by taking advantage of the correlation between the survey variable and the auxiliary data. In this article, we introduced a generalized estimator for estimating the variance of populations that are rare, hidden, geographically clustered and hard-to-reached. The proposed estimator leverages both actual and transformed auxiliary data through adaptive cluster sampling. The expressions of approximate bias and mean square error of the proposed estimator are derived up to the first-order approximation using Taylor expansion. Some special cases are also obtained using the known parameters associated with the auxiliary variable. The proposed class of estimators is compared with available estimators using simulation and real data applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    数学优化是许多科学和工业应用的核心。许多当前优化策略的一个重要问题是函数评估的数量与找到全局的概率之间的众所周知的权衡,或至少足够高质量的局部最优。在机器学习(ML)中,通过在主动学习中的扩展-例如用于自主实验-数学优化通常用于找到潜在的不确定代理模型,从中做出后续决策,因此ML依赖于高质量的优化来获得最准确的模型。主动学习通常会增加缺少离线训练数据的复杂性;因此,训练必须在数据收集期间进行,如果使用标准方法,这可能会阻碍采集。在这项工作中,我们强调了最近创建高性能混合优化算法(HGDL)的努力,将无导数全局优化策略与局部,基于导数的优化,最终产生唯一局部最优的有序列表。通过在较早遇到的最优值周围放空目标函数来避免冗余。HGDL旨在通过具有计算成本最高的过程来充分利用并行性,基于局部一阶和二阶导数的优化,在不同进程中的不同计算节点上并行运行。此外,该算法异步运行;一旦找到第一个解决方案,它可以使用,而算法继续寻找更多的解决方案。我们将提出的优化和训练策略应用于高斯过程驱动的随机函数逼近和主动学习。
    Mathematical optimization lies at the core of many science and industry applications. One important issue with many current optimization strategies is a well-known trade-off between the number of function evaluations and the probability to find the global, or at least sufficiently high-quality local optima. In machine learning (ML), and by extension in active learning - for instance for autonomous experimentation - mathematical optimization is often used to find the underlying uncertain surrogate model from which subsequent decisions are made and therefore ML relies on high-quality optima to obtain the most accurate models. Active learning often has the added complexity of missing offline training data; therefore, the training has to be conducted during the data collection which can stall the acquisition if standard methods are used. In this work, we highlight recent efforts to create a high-performance hybrid optimization algorithm (HGDL), combining derivative-free global optimization strategies with local, derivative-based optimization, ultimately yielding an ordered list of unique local optima. Redundancies are avoided by deflating the objective function around earlier encountered optima. HGDL is designed to take full advantage of parallelism by having the most computationally expensive process, the local first and second-order-derivative-based optimizations, run in parallel on separate compute nodes in separate processes. In addition, the algorithm runs asynchronously; as soon as the first solution is found, it can be used while the algorithm continues to find more solutions. We apply the proposed optimization and training strategy to Gaussian-Process-driven stochastic function approximation and active learning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    血流感染仍然是世界范围内发病和死亡的主要原因。及时、适当的治疗可以降低危重患者的死亡率。当前的诊断方法太慢,无法提供精确的抗生素选择,导致经验性抗生素的处方可能无法涵盖病原体的耐药性。冒着患者预后不佳的风险。此外,广谱抗生素的过度使用可能会导致更多的耐药生物,对日益减少的抗生素管道施加进一步的压力,以及这些抗性生物在医疗保健环境中传播的风险。因此,迫切需要快速诊断,以便在治疗过程早期更好地告知抗生素的选择.测序在减少微生物诊断时间方面提供了巨大的希望;然而,与患者样品中的病原体相比,宿主DNA的量存在显著障碍。为了解决这个问题,各种宿主消耗和细菌富集策略已被用于样品,如唾液,尿液或组织。然而,这些方法尚未被整合和/或广泛探索用于快速血流感染诊断.虽然这些工作流程中的大多数都具有个人优势,它们缺乏分析/临床敏感性和/或全面性,需要额外的改进或协同应用.因此,这篇综述根据这些方法的工作原理为它们提供了一个独特的分类系统,以指导未来的研究,讨论它们的优点和局限性,并探索潜在的改进途径。
    Bloodstream infection is a major cause of morbidity and death worldwide. Timely and appropriate treatment can reduce mortality among critically ill patients. Current diagnostic methods are too slow to inform precise antibiotic choice, leading to the prescription of empirical antibiotics, which may fail to cover the resistance profile of the pathogen, risking poor patient outcomes. Additionally, overuse of broad-spectrum antibiotics may lead to more resistant organisms, putting further pressure on the dwindling pipeline of antibiotics, and risk transmission of these resistant organisms in the health care environment. Therefore, rapid diagnostics are urgently required to better inform antibiotic choice early in the course of treatment. Sequencing offers great promise in reducing time to microbiological diagnosis; however, the amount of host DNA compared with the pathogen in patient samples presents a significant obstacle. Various host-depletion and bacterial-enrichment strategies have been used in samples, such as saliva, urine, or tissue. However, these methods have yet to be collectively integrated and/or extensively explored for rapid bloodstream infection diagnosis. Although most of these workflows possess individual strengths, their lack of analytical/clinical sensitivity and/or comprehensiveness demands additional improvements or synergistic application. This review provides a distinctive classification system for various methods based on their working principles to guide future research, and discusses their strengths and limitations and explores potential avenues for improvement to assist the reader in workflow selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    鸟枪宏基因组学通过提供对微生物污染物的快速和全面的见解,在食源性暴发的调查中被证明是有效的。然而,样品的培养富集仍然是一个先决条件,尽管生长竞争对病原体检测有潜在影响。为了规避文化丰富的需要,我们探索了使用自适应采样使用各种数据库进行靶向纳米孔测序,与单独的猎枪宏基因组学相比。
    自适应采样方法首先在与先前与食源性爆发有关的金黄色葡萄球菌菌株的DNA混合的土豆泥的DNA上进行了测试。选择性测序用于耗尽马铃薯测序读数或富集病原体测序读数,并与猎枪测序进行了比较。然后,活的金黄色葡萄球菌以105CFU掺入25g土豆泥中。测试了三个DNA提取试剂盒,结合使用自适应采样的富集,全基因组扩增后。经过数据分析,用不同的测序和提取方法表征污染物的可能性,没有文化丰富,被评估。
    总的来说,自适应采样优于鸟枪测序.虽然使用宿主去除DNA提取试剂盒和使用食源性病原体数据库的靶向测序允许快速检测病原体,当仅使用金黄色葡萄球菌数据库与常规DNA提取试剂盒结合使用时,可以实现最完整的表征,能够在爆发病例的同时将菌株准确地放置在系统发育树中。
    该方法显示出在无需培养物富集的情况下对食源性暴发进行菌株水平分析的巨大潜力,从而使更快的调查和促进精确的病原体表征。适应性采样与宏基因组学的整合为更有效和有针对性地分析食源性暴发中的微生物群落提供了有价值的策略,有助于改善食品安全和公共卫生。
    UNASSIGNED: Shotgun metagenomics has previously proven effective in the investigation of foodborne outbreaks by providing rapid and comprehensive insights into the microbial contaminant. However, culture enrichment of the sample has remained a prerequisite, despite the potential impact on pathogen detection resulting from the growth competition. To circumvent the need for culture enrichment, we explored the use of adaptive sampling using various databases for a targeted nanopore sequencing, compared to shotgun metagenomics alone.
    UNASSIGNED: The adaptive sampling method was first tested on DNA of mashed potatoes mixed with DNA of a Staphylococcus aureus strain previously associated with a foodborne outbreak. The selective sequencing was used to either deplete the potato sequencing reads or enrich for the pathogen sequencing reads, and compared to a shotgun sequencing. Then, living S. aureus were spiked at 105 CFU into 25 g of mashed potatoes. Three DNA extraction kits were tested, in combination with enrichment using adaptive sampling, following whole genome amplification. After data analysis, the possibility to characterize the contaminant with the different sequencing and extraction methods, without culture enrichment, was assessed.
    UNASSIGNED: Overall, the adaptive sampling outperformed the shotgun sequencing. While the use of a host removal DNA extraction kit and targeted sequencing using a database of foodborne pathogens allowed rapid detection of the pathogen, the most complete characterization was achieved when using solely a database of S. aureus combined with a conventional DNA extraction kit, enabling accurate placement of the strain on a phylogenetic tree alongside outbreak cases.
    UNASSIGNED: This method shows great potential for strain-level analysis of foodborne outbreaks without the need for culture enrichment, thereby enabling faster investigations and facilitating precise pathogen characterization. The integration of adaptive sampling with metagenomics presents a valuable strategy for more efficient and targeted analysis of microbial communities in foodborne outbreaks, contributing to improved food safety and public health.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    细菌质粒在抗生素抗性基因的散布中起主要感化。然而,通过DNA测序对其进行表征,这些样品中质粒DNA的丰度较低。尽管样品制备方法可以在测序前富集质粒DNA的比例,这些方法既昂贵又费力,它们可能通过仅富集特定的质粒DNA序列而引入偏见。纳米孔自适应采样可以通过在测序过程中拒绝不感兴趣的DNA分子来克服这些问题。在这项研究中,我们使用两种不同的自适应采样工具评估了自适应采样在已知细菌分离物中低丰度质粒富集中的应用。我们表明,即使在过期的流动池上也可以实现显着富集。通过应用自适应采样,我们还提高了从头质粒组装的质量并减少了测序时间。然而,我们的实验还强调了自适应采样的问题,如果靶序列和非靶序列跨越相似的区域.重要抗菌药物耐药性每年导致数百万人死亡。细菌质粒等可移动的遗传元件是抗菌素抗性基因传播的关键驱动因素。这使得通过DNA测序表征质粒成为临床微生物学家的重要工具。由于质粒在细菌样本中通常代表性不足,质粒测序可能是具有挑战性和费力的。为了加速测序过程,我们评估了纳米孔自适应采样作为一种硅方法来富集低丰度质粒。我们的结果显示了这种具有成本效益的方法在未来质粒研究中的潜力,但也指出了使用参考序列引起的问题。
    Bacterial plasmids play a major role in the spread of antibiotic resistance genes. However, their characterization via DNA sequencing suffers from the low abundance of plasmid DNA in those samples. Although sample preparation methods can enrich the proportion of plasmid DNA before sequencing, these methods are expensive and laborious, and they might introduce a bias by enriching only for specific plasmid DNA sequences. Nanopore adaptive sampling could overcome these issues by rejecting uninteresting DNA molecules during the sequencing process. In this study, we assess the application of adaptive sampling for the enrichment of low-abundant plasmids in known bacterial isolates using two different adaptive sampling tools. We show that a significant enrichment can be achieved even on expired flow cells. By applying adaptive sampling, we also improve the quality of de novo plasmid assemblies and reduce the sequencing time. However, our experiments also highlight issues with adaptive sampling if target and non-target sequences span similar regions.
    OBJECTIVE: Antimicrobial resistance causes millions of deaths every year. Mobile genetic elements like bacterial plasmids are key drivers for the dissemination of antimicrobial resistance genes. This makes the characterization of plasmids via DNA sequencing an important tool for clinical microbiologists. Since plasmids are often underrepresented in bacterial samples, plasmid sequencing can be challenging and laborious. To accelerate the sequencing process, we evaluate nanopore adaptive sampling as an in silico method for the enrichment of low-abundant plasmids. Our results show the potential of this cost-efficient method for future plasmid research but also indicate issues that arise from using reference sequences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Letter
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    纳米孔测序仪可以通过逆转单个纳米孔上的电压来富集或耗尽文库中的靶向DNA分子。然而,它需要大量的计算资源以在读取时间测序时实现并行的快速操作。我们提出了一个深度学习框架,NanoDeep,通过结合卷积神经网络和挤压和激励来克服这些限制。我们首先表明,源自天然DNA序列的原始曲线决定了微生物和人类基因组的起源。然后,我们证明,与常规纳米孔测序设置相比,NanoDeep成功地将来自具有人类序列的汇集文库的细菌读取分类,并且显示细菌序列的富集。Further,我们表明NanoDeep提高了测序效率,并保留了模拟样品中细菌基因组的保真度。此外,NanoDeep在富集肠道样本的宏基因组序列方面表现良好,显示了其在富集未知微生物群中的潜在应用。我们的工具包可在https://github.com/lysovosyl/NanoDeep上获得。
    Nanopore sequencers can enrich or deplete the targeted DNA molecules in a library by reversing the voltage across individual nanopores. However, it requires substantial computational resources to achieve rapid operations in parallel at read-time sequencing. We present a deep learning framework, NanoDeep, to overcome these limitations by incorporating convolutional neural network and squeeze and excitation. We first showed that the raw squiggle derived from native DNA sequences determines the origin of microbial and human genomes. Then, we demonstrated that NanoDeep successfully classified bacterial reads from the pooled library with human sequence and showed enrichment for bacterial sequence compared with routine nanopore sequencing setting. Further, we showed that NanoDeep improves the sequencing efficiency and preserves the fidelity of bacterial genomes in the mock sample. In addition, NanoDeep performs well in the enrichment of metagenome sequences of gut samples, showing its potential applications in the enrichment of unknown microbiota. Our toolkit is available at https://github.com/lysovosyl/NanoDeep.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号