关键词: AI data hazards data science ethics synthetic biology

来  源:   DOI:10.1093/synbio/ysae010   PDF(Pubmed)

Abstract:
Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.
摘要:
数据科学在工程生物学的设计和分析中发挥着越来越重要的作用。这是由高通量方法的发展推动的,如大规模平行报告检测,数据丰富的显微镜技术,计算蛋白质结构预测和设计,以及能够生成大量数据的全细胞模型的开发。尽管在这些情况下应用以数据为中心的分析的能力很有吸引力,而且越来越简单,它伴随着潜在的风险。例如,基础数据中的偏差如何影响结果的有效性,以及大规模数据分析的环境影响是什么?我们提出了一个社区开发的评估数据危害的框架,以帮助解决这些问题,并证明其在两个合成生物学案例研究中的应用。我们展示了常见类型的生物工程项目中考虑因素的多样性,并提供了一些指南和缓解步骤。了解使用数据时的潜在问题和危险,并主动解决这些问题,对于确保适当使用新兴的数据密集型人工智能方法至关重要,并有助于提高其在合成生物学中应用的可信度。
公众号