强化学习 Reinforcement Learning-医云文献数字医云科研云海量医学决策数据服务

Reinforcement Learning 关注

强化学习

文献(54篇)

百科

视频

1 Deep reinforcement learning models in auction item price prediction: an optimisation study of a cross-interval quotation strategy.

拍卖商品价格预测中的深度强化学习模型: 跨区间报价策略的优化研究影响指数 : 2.411
发表时间：2024
来源期刊：PeerJ Comput Sci PMID：39145250

DOI：10.7717/peerj-cs.2159
文章类型： Journal Article

在当代数字化格局和技术进步中，拍卖业经历了一场蜕变,承担起交易范式的关键作用。作为商品或服务定价的机制，拍卖的程序复杂性和效率直接影响市场动态和参与者参与。利用人工智能(AI)技术的先进能力，拍卖部门主动整合人工智能方法，以增强效率并丰富用户互动。本研究深入研究了拍卖领域内价格预测挑战的复杂性，引入复杂的RL-GRU框架进行价格区间分析。该框架首先通过GRU对商品进行定量特征提取，随后通过强化学习技术在模型环境中编排动态交互。最终,通过独具慧眼的分类模块,完成区间划分和拍卖商品价格识别的任务。在五个时间间隔内，在公开可用和内部策划的数据集中展示超过90%的精度，并在八个时间间隔内展示卓越的性能。该框架为未来拍卖价格区间预测挑战的努力提供了宝贵的技术见解。
In the contemporary digitalization landscape and technological advancement, the auction industry undergoes a metamorphosis, assuming a pivotal role as a transactional paradigm. Functioning as a mechanism for pricing commodities or services, the procedural intricacies and efficiency of auctions directly influence market dynamics and participant engagement. Harnessing the advancing capabilities of artificial intelligence (AI) technology, the auction sector proactively integrates AI methodologies to augment efficacy and enrich user interactions. This study delves into the intricacies of the price prediction challenge within the auction domain, introducing a sophisticated RL-GRU framework for price interval analysis. The framework commences by adeptly conducting quantitative feature extraction of commodities through GRU, subsequently orchestrating dynamic interactions within the model\'s environment via reinforcement learning techniques. Ultimately, it accomplishes the task of interval division and recognition of auction commodity prices through a discerning classification module. Demonstrating precision exceeding 90% across publicly available and internally curated datasets within five intervals and exhibiting superior performance within eight intervals, this framework contributes valuable technical insights for future endeavours in auction price interval prediction challenges.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
2 Nonlinear age-related differences in probabilistic learning in mice: A 5-armed bandit task study.

小鼠概率学习中与年龄相关的非线性差异： 5 臂强盗任务研究。影响指数 : 5.133
发表时间：Jun 2024 26
来源期刊：Neurobiol Aging PMID：39029360

DOI：10.1016/j.neurobiolaging.2024.06.004
文章类型： Journal Article

这项研究探讨了衰老对小鼠强化学习的影响，关注学习率和行为策略的变化。使用5臂强盗任务（5-ABT）和计算Q学习模型来评估三个年龄组（3、12和18个月）的正负学习率和逆温。结果显示18月龄小鼠的负学习率显著下降,没有观察到积极的学习率。这表明年龄较大的小鼠保持了从成功经验中学习的能力，同时降低了从负面结果中学习的能力。我们还观察到逆温的显着年龄依赖性变化，反映了行动选择政策的转变。中年小鼠（12个月）表现出更高的逆温，表明对以前的奖励经验的依赖程度更高，探索行为减少，与年轻和年长的小鼠相比。这项研究通过证明强化学习的特定组成部分存在与年龄相关的差异，为衰老研究提供了新的见解。表现出非线性模式。
This study explores the impact of aging on reinforcement learning in mice, focusing on changes in learning rates and behavioral strategies. A 5-armed bandit task (5-ABT) and a computational Q-learning model were used to evaluate the positive and negative learning rates and the inverse temperature across three age groups (3, 12, and 18 months). Results showed a significant decline in the negative learning rate of 18-month-old mice, which was not observed for the positive learning rate. This suggests that older mice maintain the ability to learn from successful experiences while decreasing the ability to learn from negative outcomes. We also observed a significant age-dependent variation in inverse temperature, reflecting a shift in action selection policy. Middle-aged mice (12 months) exhibited higher inverse temperature, indicating a higher reliance on previous rewarding experiences and reduced exploratory behaviors, when compared to both younger and older mice. This study provides new insights into aging research by demonstrating that there are age-related differences in specific components of reinforcement learning, which exhibit a non-linear pattern.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
3 Reinforcement learning for individualized lung cancer screening schedules: A nested case-control study.

个性化肺癌筛查时间表的强化学习：一项巢式病例对照研究。影响指数 : 4.711
发表时间：Jul 2024
来源期刊：Cancer Med PMID：38949177

DOI：10.1002/cam4.7436
文章类型： Journal Article

背景：当前的管理屏幕检测到的肺结节的指南提供了基于规则的建议，以立即进行诊断检查或每隔3、6或12个月进行随访。缺乏定制的访问计划。
目的：使用强化学习（RL）制定个性化的筛查计划，并评估基于RL的政策模型的有效性。
方法：使用嵌套的案例控制设计，我们回顾性地确定了308例癌症患者,这些患者在国家肺癌筛查试验的至少两轮筛查中筛查结果为阳性.我们建立了一个对照组，包括没有癌症的结节患者，根据癌症诊断年份匹配（1：1）。通过生成10,164个序列决策事件，我们训练了基于RL的策略模型，仅包含结节直径，结合结节外观（衰减和边缘）和/或患者信息（年龄，性别,吸烟状况,包年，和家族史）。我们计算了误诊率，漏诊,和延迟诊断，并比较了基于RL的政策模型和基于规则的随访协议（国家综合癌症网络指南；中国肺癌筛查和早期检测指南）的性能。
结果：我们确定了某些变量之间的显着相互作用（例如，结节形状和患者吸烟包年，超出指南协议中考虑的范围)和后续测试间隔的选择，从而影响决策序列的质量。在验证中，一个基于RL的政策模型的误诊率为12.3%，9.7%为漏诊，延迟诊断为11.7%。与两种基于规则的协议相比，三个性能最佳的基于RL的策略模型一致地证明了基于疾病特征（良性或恶性）的特定患者亚组的最佳性能，结节表型(大小，形状,和衰减)，和个人属性。
结论：这项研究强调了使用基于RL的方法的潜力，该方法在临床上可解释且性能稳健，以开发个性化的肺癌筛查时间表。我们的发现为增强当前的癌症筛查系统提供了机会。
BACKGROUND: The current guidelines for managing screen-detected pulmonary nodules offer rule-based recommendations for immediate diagnostic work-up or follow-up at intervals of 3, 6, or 12 months. Customized visit plans are lacking.
OBJECTIVE: To develop individualized screening schedules using reinforcement learning (RL) and evaluate the effectiveness of RL-based policy models.
METHODS: Using a nested case-control design, we retrospectively identified 308 patients with cancer who had positive screening results in at least two screening rounds in the National Lung Screening Trial. We established a control group that included cancer-free patients with nodules, matched (1:1) according to the year of cancer diagnosis. By generating 10,164 sequence decision episodes, we trained RL-based policy models, incorporating nodule diameter alone, combined with nodule appearance (attenuation and margin) and/or patient information (age, sex, smoking status, pack-years, and family history). We calculated rates of misdiagnosis, missed diagnosis, and delayed diagnosis, and compared the performance of RL-based policy models with rule-based follow-up protocols (National Comprehensive Cancer Network guideline; China Guideline for the Screening and Early Detection of Lung Cancer).
RESULTS: We identified significant interactions between certain variables (e.g., nodule shape and patient smoking pack-years, beyond those considered in guideline protocols) and the selection of follow-up testing intervals, thereby impacting the quality of the decision sequence. In validation, one RL-based policy model achieved rates of 12.3% for misdiagnosis, 9.7% for missed diagnosis, and 11.7% for delayed diagnosis. Compared with the two rule-based protocols, the three best-performing RL-based policy models consistently demonstrated optimal performance for specific patient subgroups based on disease characteristics (benign or malignant), nodule phenotypes (size, shape, and attenuation), and individual attributes.
CONCLUSIONS: This study highlights the potential of using an RL-based approach that is both clinically interpretable and performance-robust to develop personalized lung cancer screening schedules. Our findings present opportunities for enhancing the current cancer screening system.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
4 Reinforcement learning for closed-loop regulation of cardiovascular system with vagus nerve stimulation: a computational study.

强化学习对迷走神经刺激心血管系统闭环调节的研究. 影响指数 : 5.043
发表时间：Jun 2024 3
来源期刊：J Neural Eng PMID：38718787

DOI：10.1088/1741-2552/ad48bb
文章类型： Journal Article

（目标）。迷走神经刺激（VNS）正在研究作为一种潜在的治疗心血管疾病，包括心力衰竭，心律失常,和高血压。缺乏用于控制和调节VNS参数的系统方法提出了重大挑战。闭环VNS策略与人工智能（AI）方法相结合，为系统学习和调整最佳刺激参数提供了框架。在这项研究中,我们提出了一个使用强化学习（RL）的交互式AI框架，用于计算研究中闭环VNS控制系统的自动数据驱动设计。（方法）。开发了具有标准应用程序编程接口的多个仿真环境，以促进自动数据驱动闭环VNS控制系统的设计和评估。这些环境使用处于静息和运动状态的健康和高血压大鼠心血管系统的基于生物物理学的计算模型模拟对多位置VNS的血液动力学反应。我们在控制设定点跟踪任务的心率（HR）和平均动脉压（MAP）的背景下设计并实现了基于RL的闭环VNS控制框架。我们的实验设计包括两种方法；使用深度RL算法的通用策略和使用概率推理进行学习和控制（PILCO）的样本有效自适应策略。（主要结果）。我们的模拟结果证明了基于闭环RL的方法学习最佳VNS控制策略并适应目标设定点和心血管系统潜在动态的变化的能力。我们的发现强调了样本效率和泛化性之间的权衡，为正确的算法选择提供见解。最后,我们证明了迁移学习提高了DeepRL算法的样本效率,从而可以开发更高效和个性化的闭环VNS系统.（意义）。我们展示了基于RL的闭环VNS系统的功能。我们的方法为学习控制策略提供了一个系统的适应性框架，而无需事先了解基础动力学。
Objective. Vagus nerve stimulation (VNS) is being investigated as a potential therapy for cardiovascular diseases including heart failure, cardiac arrhythmia, and hypertension. The lack of a systematic approach for controlling and tuning the VNS parameters poses a significant challenge. Closed-loop VNS strategies combined with artificial intelligence (AI) approaches offer a framework for systematically learning and adapting the optimal stimulation parameters. In this study, we presented an interactive AI framework using reinforcement learning (RL) for automated data-driven design of closed-loop VNS control systems in a computational study.Approach.Multiple simulation environments with a standard application programming interface were developed to facilitate the design and evaluation of the automated data-driven closed-loop VNS control systems. These environments simulate the hemodynamic response to multi-location VNS using biophysics-based computational models of healthy and hypertensive rat cardiovascular systems in resting and exercise states. We designed and implemented the RL-based closed-loop VNS control frameworks in the context of controlling the heart rate and the mean arterial pressure for a set point tracking task. Our experimental design included two approaches; a general policy using deep RL algorithms and a sample-efficient adaptive policy using probabilistic inference for learning and control.Main results.Our simulation results demonstrated the capabilities of the closed-loop RL-based approaches to learn optimal VNS control policies and to adapt to variations in the target set points and the underlying dynamics of the cardiovascular system. Our findings highlighted the trade-off between sample-efficiency and generalizability, providing insights for proper algorithm selection. Finally, we demonstrated that transfer learning improves the sample efficiency of deep RL algorithms allowing the development of more efficient and personalized closed-loop VNS systems.Significance.We demonstrated the capability of RL-based closed-loop VNS systems. Our approach provided a systematic adaptable framework for learning control strategies without requiring prior knowledge about the underlying dynamics.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 The selective serotonin reuptake inhibitor sertraline alters learning from aversive reinforcements in patients with depression: evidence from a randomized controlled trial.

选择性 5 - 羟色胺再摄取抑制剂舍曲林改变了抑郁症患者从厌恶增强中的学习：来自随机对照试验的证据。影响指数 : 10.592
发表时间：Apr 2024 17
来源期刊：Psychol Med PMID：38629200

DOI：10.1017/S0033291724000837
文章类型： Journal Article

背景：选择性5-羟色胺再摄取抑制剂（SSRIs）是治疗抑郁和焦虑的一线药物。然而,人们对药理作用与认知和情感过程的关系知之甚少。这里,我们研究了特定的强化学习过程是否介导了SSRIs的治疗效果。
方法：PANDA试验是一个多中心，双盲,英国初级保健的随机临床试验，比较SSRI舍曲林与安慰剂治疗抑郁和焦虑的效果。参与者（N=655）在试验期间执行了3次情感Go/NoGo任务，并使用计算模型来推断强化学习过程。
结果：任务性能较差：只有54%的任务运行提供了信息，安慰剂组比活动组运行更多的信息任务。没有证据表明预先注册的假设是巴甫洛夫抑制受到舍曲林的影响。探索性分析显示，在舍曲林组中，巴甫洛夫抑制的早期增加与12周后抑郁症的改善相关.此外，舍曲林增加了参与者从损失中学习的速度,并且更快地从损失中学习与更严重的广泛性焦虑症状相关.
结论：研究结果表明，厌恶强化学习机制与抑郁方面之间存在关系，焦虑,和SSRI治疗，但是这些关系与最初的假设并不一致。任务表现不佳限制了调查结果的可解释性和可能的概括性，并强调了开发可接受和可靠的任务用于临床研究的关键重要性。
背景：本文介绍了NIHR应用研究计划资助（RP-PG-0610-10048）支持的研究，NIHRBRC,UCL,在IMPRSCOMP2PSYCH(JM，QH）和惠康信托基金（QH）。
BACKGROUND: Selective serotonin reuptake inhibitors (SSRIs) are first-line pharmacological treatments for depression and anxiety. However, little is known about how pharmacological action is related to cognitive and affective processes. Here, we examine whether specific reinforcement learning processes mediate the treatment effects of SSRIs.
METHODS: The PANDA trial was a multicentre, double-blind, randomized clinical trial in UK primary care comparing the SSRI sertraline with placebo for depression and anxiety. Participants (N = 655) performed an affective Go/NoGo task three times during the trial and computational models were used to infer reinforcement learning processes.
RESULTS: There was poor task performance: only 54% of the task runs were informative, with more informative task runs in the placebo than in the active group. There was no evidence for the preregistered hypothesis that Pavlovian inhibition was affected by sertraline. Exploratory analyses revealed that in the sertraline group, early increases in Pavlovian inhibition were associated with improvements in depression after 12 weeks. Furthermore, sertraline increased how fast participants learned from losses and faster learning from losses was associated with more severe generalized anxiety symptoms.
CONCLUSIONS: The study findings indicate a relationship between aversive reinforcement learning mechanisms and aspects of depression, anxiety, and SSRI treatment, but these relationships did not align with the initial hypotheses. Poor task performance limits the interpretability and likely generalizability of the findings, and highlights the critical importance of developing acceptable and reliable tasks for use in clinical studies.
BACKGROUND: This article presents research supported by NIHR Program Grants for Applied Research (RP-PG-0610-10048), the NIHR BRC, and UCL, with additional support from IMPRS COMP2PSYCH (JM, QH) and a Wellcome Trust grant (QH).

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
6 FRAMM: Fair ranking with missing modalities for clinical trial site selection.

FRAMM ：临床试验地点选择缺少模式的公平排名。影响指数 : 暂无
发表时间：Mar 2024 8
来源期刊：Patterns (N Y) PMID：38487797

DOI：10.1016/j.patter.2024.100944
文章类型： Journal Article

性别代表性不足，种族,临床试验中的少数民族和少数民族是一个问题，破坏了对少数民族的治疗效果，并阻止了对这些亚组的影响的精确估计。我们提议弗莱姆,一个用于公平审判地点选择的深度强化学习框架来帮助解决这个问题。我们专注于两个现实世界的挑战：用于指导选择的数据模式对于许多潜在的试验地点来说通常是不完整的，和网站选择需要同时优化注册和多样性。为了解决丢失的数据挑战，FRAMM具有模态编码器，该模态编码器具有用于绕过丢失数据的屏蔽交叉注意机制。为了进行有效的权衡，FRAMM使用带有奖励函数的深度强化学习，旨在同时优化注册和公平性。我们使用真实世界的历史临床试验来评估FRAMM，并表明它在仅注册设置中优于领先的基线，同时也大大提高了多样性。
The underrepresentation of gender, racial, and ethnic minorities in clinical trials is a problem undermining the efficacy of treatments on minorities and preventing precise estimates of the effects within these subgroups. We propose FRAMM, a deep reinforcement learning framework for fair trial site selection to help address this problem. We focus on two real-world challenges: the data modalities used to guide selection are often incomplete for many potential trial sites, and the site selection needs to simultaneously optimize for both enrollment and diversity. To address the missing data challenge, FRAMM has a modality encoder with a masked cross-attention mechanism for bypassing missing data. To make efficient trade-offs, FRAMM uses deep reinforcement learning with a reward function designed to simultaneously optimize for both enrollment and fairness. We evaluate FRAMM using real-world historical clinical trials and show that it outperforms the leading baseline in enrollment-only settings while also greatly improving diversity.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
7 A study on robot force control based on the GMM/GMR algorithm fusing different compensation strategies.

基于 GMM / GMR 算法融合不同补偿策略的机器人力控制研究影响指数 : 3.493
发表时间：2024
来源期刊：Front Neurorobot PMID：38348018

DOI：10.3389/fnbot.2024.1290853
文章类型： Journal Article

为了解决传统的阻抗控制方法在机器人与皮肤接触期间难以获得稳定的力，提出了一种基于高斯混合模型/高斯混合回归(GMM/GMR)算法融合不同补偿策略的力控制方法。通过阻抗控制模型建立机器人末端执行器与人体皮肤的接触关系。为了让机器人适应灵活的皮肤环境，强化学习算法和基于皮肤力学模型的策略补偿阻抗控制策略。提出了两种不同的可离线训练的强化学习环境动力学模型，以快速获得强化学习策略。基于GMM/GMR算法融合了三种不同的补偿策略,利用物理模型的在线计算和强化学习的离线策略，能提高算法在适应不同皮肤环境时的鲁棒性和通用性。实验结果表明,基于GMM/GMR算法融合不同补偿策略的机器人力控制获得的接触力相对稳定。它比阻抗控制具有更好的通用性，力误差在±0.2N以内
To address traditional impedance control methods\' difficulty with obtaining stable forces during robot-skin contact, a force control based on the Gaussian mixture model/Gaussian mixture regression (GMM/GMR) algorithm fusing different compensation strategies is proposed. The contact relationship between a robot end effector and human skin is established through an impedance control model. To allow the robot to adapt to flexible skin environments, reinforcement learning algorithms and a strategy based on the skin mechanics model compensate for the impedance control strategy. Two different environment dynamics models for reinforcement learning that can be trained offline are proposed to quickly obtain reinforcement learning strategies. Three different compensation strategies are fused based on the GMM/GMR algorithm, exploiting the online calculation of physical models and offline strategies of reinforcement learning, which can improve the robustness and versatility of the algorithm when adapting to different skin environments. The experimental results show that the contact force obtained by the robot force control based on the GMM/GMR algorithm fusing different compensation strategies is relatively stable. It has better versatility than impedance control, and the force error is within ~±0.2 N.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Reduction of Aversive Learning Rates in Pavlovian Conditioning by Angiotensin II Antagonist Losartan: A Randomized Controlled Trial.

血管紧张素 II 拮抗剂氯沙坦降低巴甫洛夫调节中的厌恶学习率：一项随机对照试验。影响指数 : 12.81
发表时间：Aug 2024 15
来源期刊：Biol Psychiatry PMID：38309320

DOI：10.1016/j.biopsych.2024.01.020
文章类型： Journal Article

背景：血管紧张素受体阻滞剂（ARB）与厌恶的学习和记忆形成有关，以及预防创伤后应激障碍症状的发展。
方法：我们使用概率学习范式研究ARB氯沙坦对厌恶性巴甫洛夫条件的影响。在双盲中，随机安慰剂对照设计，我们在基线会议期间测试了45名（18名女性）健康志愿者，在应用氯沙坦或安慰剂（药物会议）和随访期间。在每一次会议上，参与者参与了一项任务,他们必须在每次试验中预测电刺激的概率,而真正的电击突发事件在高电击威胁阶段和低电击威胁阶段之间反复切换.使用计算强化学习模型来研究学习动态。
结果：急性给予氯沙坦显著降低了参与者在低至高和高至低威胁变化期间的调整。与基线相比，氯沙坦组在药物疗程中的厌恶学习率降低。50mg药物剂量不会引起血压降低或反应时间改变，排除注意力和参与度的普遍减少。在24小时后的后续会议上，减少了对厌恶预期的调整。
结论：这项研究表明，氯沙坦可在厌恶的环境中急剧减少巴甫洛夫学习，强调肾素-血管紧张素系统在焦虑发展中的潜在作用。
BACKGROUND: Angiotensin receptor blockade has been linked to aspects of aversive learning and memory formation and to the prevention of posttraumatic stress disorder symptom development.
METHODS: We investigated the influence of the angiotensin receptor blocker losartan on aversive Pavlovian conditioning using a probabilistic learning paradigm. In a double-blind, randomized, placebo-controlled design, we tested 45 (18 female) healthy volunteers during a baseline session, after application of losartan or placebo (drug session), and during a follow-up session. During each session, participants engaged in a task in which they had to predict the probability of an electrical stimulation on every trial while the true shock contingencies switched repeatedly between phases of high and low shock threat. Computational reinforcement learning models were used to investigate learning dynamics.
RESULTS: Acute administration of losartan significantly reduced participants\' adjustment during both low-to-high and high-to-low threat changes. This was driven by reduced aversive learning rates in the losartan group during the drug session compared with baseline. The 50-mg drug dose did not induce reduction of blood pressure or change in reaction times, ruling out a general reduction in attention and engagement. Decreased adjustment of aversive expectations was maintained at a follow-up session 24 hours later.
CONCLUSIONS: This study shows that losartan acutely reduces Pavlovian learning in aversive environments, thereby highlighting a potential role of the renin-angiotensin system in anxiety development.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
9 A Causal Role of Right Temporoparietal Junction in Prosocial Learning: A Transcranial Direct Current Stimulation Study.

正确的 TPJ 在亲社会学习中的因果作用：一项 tDCS 研究。影响指数 : 3.708
发表时间：Feb 2024 6
来源期刊：Neuroscience PMID：38145822

DOI：10.1016/j.neuroscience.2023.12.008
文章类型： Randomized Controlled Trial

亲社会行为是日常社会生活中常见而重要的方面。亲社会的行为，我们需要了解我们的行为对其他人的后果，被称为亲社会学习。先前的研究已将右颞顶交界处（rTPJ）确定为亲社会行为的关键神经基础。然而,关于rTPJ在亲社会学习中的因果作用知之甚少。为了阐明rTPJ在亲社会学习中的作用，我们使用了强化学习范式和经颅直流电刺激（tDCS）。总共招募了75名参与者，并随机分配到阳极或假tDCS组。当通过rTPJ接收tDCS刺激时，参与者被指示在不同的刺激之间进行选择,这些刺激与在自我学习条件下对自己或在亲社会学习条件下对他人的奖励概率相关.参与者能够学会为自己或他人获得奖励，在自我学习条件下的学习表现优于在亲社会学习条件下的学习表现。然而,rTPJ上的节点tDCS显着提高了亲社会学习条件下的学习绩效。这些结果表明，rTPJ在亲社会学习中起着因果作用。
Prosocial behavior is a common and important aspect of everyday social life. To behave prosocially, we need to learn the consequences of our actions for other people, known as prosocial learning. Previous studies have identified the right temporoparietal junction (rTPJ) as the critical neurological substrate for prosocial behavior. However, little is known about the causal role of the rTPJ in prosocial learning. To clarify the role of the rTPJ in prosocial learning, we used a reinforcement learning paradigm and transcranial direct current stimulation (tDCS). A total of 75 participants were recruited and randomly assigned to the anodal or sham tDCS group. While receiving tDCS stimulation over the rTPJ, participants were instructed to choose between different stimuli that were probabilistically associated with rewards for themselves in the self-learning condition or for another person in the prosocial-learning condition. Participants were able to learn to obtain rewards for themselves or others, and learning performance in the self-learning condition was better than that in the prosocial-learning condition. However, anodal tDCS over the rTPJ significantly improved learning performance in the prosocial-learning condition. These results indicate that the rTPJ plays a causal role in prosocial learning.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
10 Using Bandit Algorithms to Maximize SARS-CoV-2 Case-Finding: Evaluation and Feasibility Study.

使用强盗算法最大化 SARS - CoV - 2 病例发现：评估和可行性研究。影响指数 : 14.557
发表时间：08 2023 15
来源期刊：JMIR Public Health Surveill PMID：37581924

DOI：10.2196/39754
文章类型： Journal Article

背景：灵活的自适应算法监测测试（FAAST）计划代表了一种改进新的传染病病例检测的创新方法；它在这里被部署用于筛查和诊断SARS-CoV-2。随着COVID-19治疗的出现，发现感染SARS-CoV-2的个体是临床和公共卫生的当务之急。虽然这些类型的贝叶斯搜索算法在其他设置中被广泛使用(例如，寻找被击落的飞机,在潜艇回收中，并协助石油勘探)，这是该领域首次将贝叶斯自适应方法用于主动疾病监测.
目的：这项研究的目的是评估贝叶斯搜索算法，以针对社区中SARS-CoV-2传播的热点，目的是随着时间的推移，在哥伦布的多个地点发现大多数病例。俄亥俄州,2021年8月至10月。
方法：用于此项目的直接弹出式SARS-CoV-2测试的算法基于汤普森采样，其中的目的是根据每个测试地点的先验概率分布的采样，最大限度地提高一组测试地点中诊断出的SARS-CoV-2新病例的平均数量。耶鲁大学之间的学术-政府伙伴关系，俄亥俄州立大学,威克森林大学,俄亥俄州卫生部,俄亥俄州国民警卫队,哥伦布大都会图书馆对土匪算法进行了研究，以最大限度地检测2021年俄亥俄州城市的SARS-CoV-2新病例。该计划在哥伦布的13个地点建立了弹出式COVID-19测试站点，包括图书馆分支机构，娱乐和社区中心，电影院,无家可归者收容所,家庭服务中心,和社区活动网站。我们的团队在16个测试事件中进行了0到56个测试，每个事件进行的总体平均值为25.3，移动平均值随时间增加。向那些接近弹出式网站以鼓励他们参与的人提供了小型激励措施，包括礼品卡和带回家的快速抗原测试。
结果：随着时间的推移，正如预期的那样，贝叶斯搜索算法将测试工作定向到新诊断结果较高的位置.令人惊讶的是,该算法的使用还最大限度地提高了服务不足社区的少数民族居民的案件识别能力，尤其是非洲裔美国人，相对于测试地点所在的当地邮政编码的人口统计概况，参与者人数过多。
结论：这项研究表明，使用bandit算法的弹出式测试策略可以在大流行期间在城市环境中可行地部署。这是首次在现实世界中使用这些算法进行疾病监测，并且代表了评估其使用有效性的关键步骤，以最大程度地检测未确诊的SARS-CoV-2和其他感染病例。比如HIV。
The Flexible Adaptive Algorithmic Surveillance Testing (FAAST) program represents an innovative approach for improving the detection of new cases of infectious disease; it is deployed here to screen and diagnose SARS-CoV-2. With the advent of treatment for COVID-19, finding individuals infected with SARS-CoV-2 is an urgent clinical and public health priority. While these kinds of Bayesian search algorithms are used widely in other settings (eg, to find downed aircraft, in submarine recovery, and to aid in oil exploration), this is the first time that Bayesian adaptive approaches have been used for active disease surveillance in the field.
This study\'s objective was to evaluate a Bayesian search algorithm to target hotspots of SARS-CoV-2 transmission in the community with the goal of detecting the most cases over time across multiple locations in Columbus, Ohio, from August to October 2021.
The algorithm used to direct pop-up SARS-CoV-2 testing for this project is based on Thompson sampling, in which the aim is to maximize the average number of new cases of SARS-CoV-2 diagnosed among a set of testing locations based on sampling from prior probability distributions for each testing site. An academic-governmental partnership between Yale University, The Ohio State University, Wake Forest University, the Ohio Department of Health, the Ohio National Guard, and the Columbus Metropolitan Libraries conducted a study of bandit algorithms to maximize the detection of new cases of SARS-CoV-2 in this Ohio city in 2021. The initiative established pop-up COVID-19 testing sites at 13 Columbus locations, including library branches, recreational and community centers, movie theaters, homeless shelters, family services centers, and community event sites. Our team conducted between 0 and 56 tests at the 16 testing events, with an overall average of 25.3 tests conducted per event and a moving average that increased over time. Small incentives-including gift cards and take-home rapid antigen tests-were offered to those who approached the pop-up sites to encourage their participation.
Over time, as expected, the Bayesian search algorithm directed testing efforts to locations with higher yields of new diagnoses. Surprisingly, the use of the algorithm also maximized the identification of cases among minority residents of underserved communities, particularly African Americans, with the pool of participants overrepresenting these people relative to the demographic profile of the local zip code in which testing sites were located.
This study demonstrated that a pop-up testing strategy using a bandit algorithm can be feasibly deployed in an urban setting during a pandemic. It is the first real-world use of these kinds of algorithms for disease surveillance and represents a key step in evaluating the effectiveness of their use in maximizing the detection of undiagnosed cases of SARS-CoV-2 and other infections, such as HIV.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

Reinforcement Learning 关注

1 Deep reinforcement learning models in auction item price prediction: an optimisation study of a cross-interval quotation strategy.

2 Nonlinear age-related differences in probabilistic learning in mice: A 5-armed bandit task study.

3 Reinforcement learning for individualized lung cancer screening schedules: A nested case-control study.

4 Reinforcement learning for closed-loop regulation of cardiovascular system with vagus nerve stimulation: a computational study.

5 The selective serotonin reuptake inhibitor sertraline alters learning from aversive reinforcements in patients with depression: evidence from a randomized controlled trial.

6 FRAMM: Fair ranking with missing modalities for clinical trial site selection.

7 A study on robot force control based on the GMM/GMR algorithm fusing different compensation strategies.

8 Reduction of Aversive Learning Rates in Pavlovian Conditioning by Angiotensin II Antagonist Losartan: A Randomized Controlled Trial.

9 A Causal Role of Right Temporoparietal Junction in Prosocial Learning: A Transcranial Direct Current Stimulation Study.

10 Using Bandit Algorithms to Maximize SARS-CoV-2 Case-Finding: Evaluation and Feasibility Study.