multilingual

多语言
  • 文章类型: Journal Article
    卡尔加里-剑桥指南是一个广受认可的框架,用于向医疗保健专业人员教授沟通技巧,已成为医学和其他医疗保健领域沟通培训计划的基石。在兽医学的背景下,它融入通信培训计划已成为改善通信的资产,教育,互动,和服务质量,加强兽医-客户-患者关系(VCPR)。在兽医学中,然而,更具挑战性的咨询动态涉及兽医,所有者,和动物。在母语(粤语)与英语共存的香港,当咨询由非母语人士主导时,增加一名兽医助理担任翻译或翻译是很常见的。这种加法将这种常见的二元模型转换为三元通信模型。助理口译员的增加会影响咨询的进行方式,信息是如何传达的,以及人际线索和移情是如何传递的。在本报告中,我们描述了在香港多元文化和多语种兽医医疗中心应用《卡尔加里-剑桥指南》的挑战,并强调了兽医支持人员在这些情况下的作用,特别是兽医助理口译员。
    The Calgary-Cambridge Guide is a widely recognised framework for teaching communication skills to healthcare professionals that has become a cornerstone of communication training programs in medicine and other healthcare fields. In the context of veterinary medicine, its integration into communication training programs has become an asset improving communication, education, interaction, and quality of service, enhancing the veterinary-client-patient relationship (VCPR). In veterinary medicine, however, a more challenging consultation dynamic involves the veterinarian, the owner, and the animal. The addition of a veterinary assistant that acts as an interpreter or translator is common in Hong Kong where the native language (Cantonese) coexists with English when consultations are led by non-native language speakers. This addition converts this commonly dyadic model into a triadic communication model. The addition of an assistant interpreter influences the way consultations are conducted, how information is conveyed, and how interpersonal cues and empathy are delivered. In this report we depict challenges applying the Calgary-Cambridge Guide in multicultural and multilingual veterinary medical centres in Hong Kong and highlight the role of veterinary supporting staff in these scenarios, specifically veterinary assistant interpreters.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Case Reports
    此案例研究测量了LeeSilverman语音治疗(LSVT)改善人声特征的程度,清晰度,以及多语种参与者的沟通有效性,这些参与者患有可疑的进行性核上性麻痹继发的运动减少-运动过度性构音障碍。由于参与者在治疗前表现出的优势和缺陷,因此选择了LSVT治疗。以及多语种的存在和认知功能受损可能带来的预期治疗挑战。
    60多岁的多语言患者(英文,西班牙语,和法语)怀疑进行性核上性麻痹继发的运动机能减退-运动过度性构音障碍完成了LSVT的标准治疗。在基线时采取了评估措施,立即治疗后,治疗后三个月。
    声音质量的改善,声乐响度,清晰度,和沟通有效性立即治疗后。治疗后三个月,维持了声音质量和清晰度的改善。
    本案例研究表明,LSVT可能是一种有益的治疗复杂客户谁是多语言和存在复杂的合并症和认知缺陷。LSVT导致了声音质量的一些有意义的变化,清晰度,以及对这个人的沟通有效性。与复杂患者一起工作的临床医生可能希望考虑LSVT的理论基础,客户配置文件,客户需要的领域,以及完成强化治疗计划的能力和愿望,以确定是否适合进行LSVT试验。对复杂客户使用LSVT可能会产生积极的结果。
    UNASSIGNED: This case study measured how well the Lee Silverman Voice Treatment (LSVT) improved vocal features, intelligibility, and communicative effectiveness for a multilingual participant with hypokinetic-hyperkinetic dysarthria secondary to suspected progressive supranuclear palsy. LSVT treatment was chosen for the participant due to the strengths and deficits he presented with prior to treatment, and for the anticipated challenges in treatment that may arise from the presence of multilingualism and impaired cognitive functioning.
    UNASSIGNED: A multilingual patient in their 60\'s (English, Spanish, and French) with hypokinetic-hyperkinetic dysarthria secondary to suspected progressive supranuclear palsy completed the standard treatment sessions for LSVT. Assessment measures were taken at baseline, immediately post-treatment, and three-months post-treatment.
    UNASSIGNED: Improvements were measured in vocal quality, vocal loudness, intelligibility, and communicative effectiveness immediately post-treatment. Three months post-treatment, improvements in vocal quality and intelligibility were maintained.
    UNASSIGNED: This case study illustrates that LSVT may be a beneficial treatment for complex clients who are multilingual and present with complex comorbidities and cognitive deficits. LSVT resulted in some meaningful changes in vocal quality, intelligibility, and communicative effectiveness for this individual. Clinicians who work with complex patients may wish to consider the theoretical underpinnings of LSVT, client profile, areas of client need, and ability and desire to complete an intensive treatment program to determine if trialing LSVT is appropriate. The use of LSVT with complex clients may yield positive outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    精神疾病的发病率,比如自杀意念和抑郁,正在增加,这凸显了对早期检测方法的迫切需要。人们对使用自然语言处理(NLP)模型来分析患者的文本数据越来越感兴趣。但出于隐私考虑,出于研究目的访问患者数据可能会很有挑战性。联合学习(FL)是一种有前途的方法,可以平衡集中学习的需求与数据所有权敏感性。在这项研究中,我们使用模拟的多语言数据集检查FL模型在检测抑郁症方面的有效性.我们分析了五种不同语言的社交媒体帖子,样本量不同。我们的发现表明,FL在大多数情况下都能实现出色的性能,同时为独立和非独立的客户端分区维护客户端的隐私。
    The incidences of mental health illnesses, such as suicidal ideation and depression, are increasing, which highlights the urgent need for early detection methods. There is a growing interest in using natural language processing (NLP) models to analyze textual data from patients, but accessing patients\' data for research purposes can be challenging due to privacy concerns. Federated learning (FL) is a promising approach that can balance the need for centralized learning with data ownership sensitivity. In this study, we examine the effectiveness of FL models in detecting depression by using a simulated multilingual dataset. We analyzed social media posts in five different languages with varying sample sizes. Our findings indicate that FL achieves strong performance in most cases while maintaining clients\' privacy for both independent and non-independent client partitioning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    公共安全领域的情绪分析涉及分析公众情绪,情感,意见,以及对事件的态度,现象,和危机。然而,讽刺的复杂性,这往往会改变预期的含义,结合使用双语代码混合内容,阻碍了情绪分析系统。目前,有限的数据集集中在这些问题上。本文介绍了通过系统的数据采集和注释过程构建的综合数据集。收购过程包括从社交媒体平台收集数据,从关键字搜索开始,查询,和刮擦,导致获得的数据集。随后的注释过程涉及精炼和标记,从数据合并开始,选择,和注释,以注释数据集结尾。来自不同领域的三名专家注释员被任命负责标签工作,在内容中产生了情感和讽刺的决定。此外,指定了专门研究文学的注释器,以对每个内容进行语言识别。该数据集对自然语言处理和机器学习领域做出了有价值的贡献,特别是在公共安全领域和东南亚的多语种国家。
    Sentiment analysis in the public security domain involves analysing public sentiment, emotions, opinions, and attitudes toward events, phenomena, and crises. However, the complexity of sarcasm, which tends to alter the intended meaning, combined with the use of bilingual code-mixed content, hampers sentiment analysis systems. Currently, limited datasets are available that focus on these issues. This paper introduces a comprehensive dataset constructed through a systematic data acquisition and annotation process. The acquisition process includes collecting data from social media platforms, starting with keyword searching, querying, and scraping, resulting in an acquired dataset. The subsequent annotation process involves refining and labelling, starting with data merging, selection, and annotation, ending in an annotated dataset. Three expert annotators from different fields were appointed for the labelling tasks, which produced determinations of sentiment and sarcasm in the content. Additionally, an annotator specialized in literature was appointed for language identification of each content. This dataset represents a valuable contribution to the field of natural language processing and machine learning, especially within the public security domain and for multilingual countries in Southeast Asia.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    已经有许多出版的图片语料库。然而,世界上超过一半的人口说一种以上的语言,语言和文化交织在一起,图片语料库中为特定文化中的给定语言设计的某些项目可能不适合另一种文化(具有相同或不同的语言)。人们还意识到,语言研究可以从对沉浸在促进语言间互动的多语言环境中的双/多语言个体的研究中获益。因此,我们开发了一个相对较大的图片语料库(663个名词,96个动词),并从卡纳达语(印度南部语言)的多语种使用者那里收集了两种与图片相关的措施(名称协议,图像协议)和三个与单词相关的度量(熟悉度,主观频率,收购年龄),并报告客观的视觉复杂性和单词的音节计数。命名标签被分类为目标语言中的单词(即,卡纳达语),同源(从另一种语言借用/共享),翻译等价物,和阐述。图片语料库具有>85%的平均概念一致性,每个概念具有多个可接受的名称(1-7个命名标签)。模态名称的平均百分比名称一致性>70%,名词的H统计量为0.89,动词的H统计量为0.52。我们还分析了响应的可变性,突出了双/多种语言对(图片)命名的影响。研究人员和临床医生可以免费访问图片语料库。它可以用于未来与类似文化背景的其他语言的标准化,和相关的项目可以用于来自不同文化的语言,遵循适当的标准化。
    There have been many published picture corpora. However, more than half of the world\'s population speaks more than one language and, as language and culture are intertwined, some of the items from a picture corpus designed for a given language in a particular culture may not fit another culture (with the same or different language). There is also an awareness that language research can gain from the study of bi-/multilingual individuals who are immersed in multilingual contexts that foster inter-language interactions. Consequently, we developed a relatively large corpus of pictures (663 nouns, 96 verbs) and collected normative data from multilingual speakers of Kannada (a southern Indian language) on two picture-related measures (name agreement, image agreement) and three word-related measures (familiarity, subjective frequency, age of acquisition), and report objective visual complexity and syllable count of the words. Naming labels were classified into words from the target language (i.e., Kannada), cognates (borrowed from/shared with another language), translation equivalents, and elaborations. The picture corpus had > 85% mean concept agreement with multiple acceptable names (1-7 naming labels) for each concept. The mean percentage name agreement for the modal name was > 70%, with H-statistics of 0.89 for nouns and 0.52 for verbs. We also analyse the variability of responses highlighting the influence of bi-/multilingualism on (picture) naming. The picture corpus is freely accessible to researchers and clinicians. It may be used for future standardization with other languages of similar cultural contexts, and relevant items can be used in languages from different cultures, following suitable standardization.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:由于缺乏标准化的诊断系统,阻碍了了解癫痫患者认知特征整体变异性的努力。这项研究检查了国际癫痫认知障碍分类(IC-CODE)在印度语言多样的颞叶癫痫(TLE)患者队列中的跨文化适用性,教育,和文化背景。
    方法:来自孟买的548名成人TLE患者完成了一项术前综合神经心理学评估。IC-CoDE分类法用于推导样本中的认知表型。进行了方差分析,以检查不同表型的人口统计学和临床特征的差异,和卡方检验用于确定孟买样本和美国多中心样本的已发表数据之间的表型分布是否不同.
    结果:使用IC-Code标准,我们队列中47%的人表现出完整的认知特征,31%的单域损伤,16%的bidomain减值,和6%的广义减损概况。在印度和美国队列中,完整表型和bidomain表型的认知表型分布相似,但对于单个域和广义域不同。印度队列中单域损伤患者比例较大,美国队列中广泛性损伤患者比例较大。在单域损伤患者中,在印度队列中,更大比例的人表现出记忆障碍,而在美国样本中,有更大比例的人表现出语言障碍,可能反映了语言管理程序和样本特征的差异,包括印度样本中颞叶硬化症的发生率较高。
    结论:我们的结果证明了IC-Code在一组文化和语言不同的印度患者中的适用性。这种方法增强了我们对跨文化认知变异性的理解,并使人们能够对癫痫的神经心理学方面进行协调和包容的研究。
    OBJECTIVE: Efforts to understand the global variability in cognitive profiles in patients with epilepsy have been stymied by the lack of a standardized diagnostic system. This study examined the cross-cultural applicability of the International Classification of Cognitive Disorders in Epilepsy (IC-CoDE) in a cohort of patients with temporal lobe epilepsy (TLE) in India that was diverse in language, education, and cultural background.
    METHODS: A cohort of 548 adults with TLE from Mumbai completed a presurgical comprehensive neuropsychological evaluation. The IC-CoDE taxonomy was applied to derive cognitive phenotypes in the sample. Analyses of variance were conducted to examine differences in demographic and clinical characteristics across the phenotypes, and chi-squared tests were used to determine whether the phenotype distribution differed between the Mumbai sample and published data from a multicenter US sample.
    RESULTS: Using the IC-CoDE criteria, 47% of our cohort showed an intact cognitive profile, 31% a single-domain impairment, 16% a bidomain impairment, and 6% a generalized impairment profile. The distribution of cognitive phenotypes was similar between the Indian and US cohorts for the intact and bidomain phenotypes, but differed for the single and generalized domains. There was a larger proportion of patients with single-domain impairment in the Indian cohort and a larger proportion with generalized impairment in the US cohort. Among patients with single-domain impairment, a greater proportion exhibited memory impairment in the Indian cohort, whereas a greater proportion showed language impairment in the US sample, likely reflecting differences in language administration procedures and sample characteristics including a higher rate of mesial temporal sclerosis in the Indian sample.
    CONCLUSIONS: Our results demonstrate the applicability of IC-CoDE in a group of culturally and linguistically diverse patients from India. This approach enhances our understanding of cognitive variability across cultures and enables harmonized and inclusive research into the neuropsychological aspects of epilepsy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:缺乏体力活动与亚裔美国人的不良健康结果有关,与其他种族和族裔群体相比,他们对体育活动指南的坚持程度最低。基于移动应用程序的干预措施是促进健康行为的一种有前途的方法。然而,在主要语言不是英语的亚裔美国人中,缺乏基于应用程序的干预措施来改善体育锻炼。
    目的:这项试点研究旨在评估5周干预的可行性和可接受性,基于证据的手机应用程序与加速度计程序,为了促进中国人的体育锻炼,Tagalog-,或者说越南语的美国人.
    方法:参与者是通过与社区组织合作招募的。该干预措施是根据一项为期12个月的体育锻炼随机对照试验进行的,该试验涉及针对英语成年人的应用程序和加速度计。社会人口统计学特征,生活方式因素,在基线访视时收集物理测量值.进行了7天的磨合期,以筛选可以佩戴FitbitOne(FitbitLLC)加速度计并完成应用程序的每日步骤日记的参与者。在为期4周的干预期间,参与者佩戴加速度计,并在app中报告他们每天的步数.参与者还收到每日信息,以加强面对面教育期间教授的关键内容,提醒他们输入步骤,并提供量身定制的反馈。可行性衡量指标是完成磨合期的合格参与者的百分比,以及在干预期内7天中至少5天使用应用程序日记的参与者的百分比。我们进行了研究后参与者访谈,以探讨整体干预的可接受性。
    结果:在研究开始时,共有19名参与者入组,平均年龄为47岁(SD13.3;范围29-70岁),其中58%(n=11)是女性。在参与者中,26%(n=5)是中国人,32%(n=6)是越南人,42%(n=8)是菲律宾人。所有参与者都符合磨合标准,可以继续进行干预。对应用程序日记的依从性从第2周的74%(n=14)到第4周的95%(n=18)。加速度计每周的每日平均步数从磨合期的8451(SD3378)步数增加到第4周的10,930(SD4213)步。参与者报告了积极的经历,包括增加了行走的动力和能够监测他们的身体活动的乐趣。
    结论:这是第一个针对多组分干预和循证手机应用程序的试点研究,旨在促进使用传统中文应用程序的亚裔美国人的体育锻炼,他加禄语,或者越南人,具有很高的可行性和可接受性。未来的工作重点是多语言移动应用程序,以解决亚裔美国人之间缺乏身体活动的差异。
    BACKGROUND: Physical inactivity is associated with adverse health outcomes among Asian Americans, who exhibit the least adherence to physical activity guidelines compared with other racial and ethnic groups. Mobile app-based interventions are a promising approach to promote healthy behaviors. However, there is a lack of app-based interventions focused on improving physical activity among Asian Americans whose primary language is not English.
    OBJECTIVE: This pilot study aimed to assess the feasibility and acceptability of a 5-week intervention using a culturally and linguistically adapted, evidence-based mobile phone app with an accelerometer program, to promote physical activity among Chinese-, Tagalog-, or Vietnamese-speaking Americans.
    METHODS: Participants were recruited through collaborations with community-based organizations. The intervention was adapted from a 12-month physical activity randomized controlled trial involving the app and accelerometer for English-speaking adults. Sociodemographic characteristics, lifestyle factors, and physical measurements were collected at the baseline visit. A 7-day run-in period was conducted to screen for the participants who could wear a Fitbit One (Fitbit LLC) accelerometer and complete the app\'s daily step diary. During the 4-week intervention period, participants wore the accelerometer and reported their daily steps in the app. Participants also received daily messages to reinforce key contents taught during an in-person educational session, remind them to input steps, and provide tailored feedback. Feasibility measures were the percentage of eligible participants completing the run-in period and the percentage of participants who used the app diary for at least 5 out of 7 days during the intervention period. We conducted poststudy participant interviews to explore overall intervention acceptability.
    RESULTS: A total of 19 participants were enrolled at the beginning of the study with a mean age of 47 (SD 13.3; range 29-70) years, and 58% (n=11) of them were female. Of the participants, 26% (n=5) were Chinese, 32% (n=6) were Vietnamese, and 42% (n=8) were Filipino. All participants met the run-in criteria to proceed with the intervention. Adherence to the app diary ranged from 74% (n=14) in week 2 to 95% (n=18) in week 4. The daily average steps per week from accelerometers increased each week from 8451 (SD 3378) steps during the run-in period to 10,930 (SD 4213) steps in week 4. Participants reported positive experiences including an increased motivation to walk and the enjoyment of being able to monitor their physical activity.
    CONCLUSIONS: This is the first pilot study of a multicomponent intervention and evidence-based mobile phone app to promote physical activity among Asian Americans who use apps in traditional Chinese, Tagalog, or Vietnamese, which demonstrated high feasibility and acceptability. Future work focused on multilingual mobile apps to address disparities in physical inactivity among Asian Americans should be considered.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:证明可以实施文化和语言上适当的远程健康协议,以改善患者的血糖控制,作为常规临床服务的延伸,并提供连续性护理。方法:在COVID-19大流行期间建立了远程医疗平台,从众多的远程医疗遭遇中,我们对498名患者进行了抽样,这些患者在12个月内接受了远程医疗干预,以提供特定服务:Rx补充,实验室结果咨询,健康评估和教育,以及适当转诊的急性或患病就诊。该远程医疗平台与远程患者监测系统集成,该远程患者监测系统利用支持蓝牙的血糖仪与糖尿病患者的异常基线血红蛋白A1C(HgA1C)相比。以预定间隔记录血糖值以监测糖尿病对照。患者的种族多样性和教育水平要求解决数字鸿沟,语言解释,和导航在每个监控步骤。结果:该方法表明,与对照组相比,可以实施文化和语言上适当的远程健康协议,以改善干预组患者的血糖控制。血糖控制的验证是基于70名根据纳入标准确定有资格参与的患者:在过去10个月内获得7%或更高的HgA1C水平。根据COVID-19大流行期间患者参与的限制,42名参与者获得了知情同意。结论:我们得出的结论是,远程医疗程序用于很少或没有远程自我监测方法的先验知识的患者可以支持他们对慢性病的治疗。比如糖尿病。在我们诊所的一组明确定义的服务不足的种族和少数族裔患者中观察到了远程医疗服务实施的结果。我们现在有了一个协议,将其扩展到其他慢性疾病,并用作常规临床程序。
    Objective: To demonstrate that a culturally and linguistically appropriate telehealth protocol can be implemented to improve the glycemic control of patients as an extension of regular clinical services and provide continuity of care. Methods: A telehealth platform was established during COVID-19 pandemic and from numerous telehealth encounters we sampled 498 patients who received telehealth intervention over a 12-month period for specific services: Rx refill, consultation for laboratory results, wellness evaluation and education, and acute or sick visits with appropriate referrals. This telehealth platform was integrated with a remote patient monitoring system utilizing a Bluetooth-enabled glucometer for patients with diabetes compared to their abnormal baseline hemoglobin A1C (HgA1C). The Blood sugar values were recorded at predefined intervals to monitor controls for diabetes. The ethnic diversity and level of education of patients required addressing the digital divide, language interpretation, and navigation at each monitoring step. Results: This method demonstrated that a culturally and linguistically appropriate telehealth protocol can be implemented to improve the glycemic control of patients in an intervention group compared with a control group. Validation of the glycemic control was based on 70 patients identified as eligible for participation based on the inclusion criteria: a HgA1C level of 7% or higher obtained within the last 10 months. Informed consent was obtained for 42 participants based on patient participation constraints during the COVID-19 pandemic. Conclusions: We conclude that telemedicine procedures utilized for patients with little or no prior knowledge of remote self-monitoring methods can support their treatment of chronic diseases, such as diabetes. The outcomes from the implementation of telemedicine services were observed in a well-defined group of underserved racial and ethnic minority patients at our clinic. We now have a protocol to expand this to other chronic diseases and used as a regular clinical procedure.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在数字化书面内容的领域,低资源语言带来的挑战值得注意。这些语言,往往缺乏全面的语言资源,需要特别注意开发用于精确光学字符识别(OCR)的强大系统。本文讨论了关注此类语言的重要性,并介绍了ViLanOCR,为乌尔都语和英语量身定制的创新双语OCR系统。与现有系统不同,与低资源语言的复杂性作斗争,ViLanOCR利用先进的基于多语言转换器的语言模型来实现卓越的性能。所提出的方法使用字符错误率(CER)度量进行评估,并在乌尔都语UHWR数据集上获得最先进的结果,CER为1.1%。实验结果证明了该方法的有效性,超越乌尔都语手写数字化的最先进的基线。
    In the realm of digitizing written content, the challenges posed by low-resource languages are noteworthy. These languages, often lacking in comprehensive linguistic resources, require specialized attention to develop robust systems for accurate optical character recognition (OCR). This article addresses the significance of focusing on such languages and introduces ViLanOCR, an innovative bilingual OCR system tailored for Urdu and English. Unlike existing systems, which struggle with the intricacies of low-resource languages, ViLanOCR leverages advanced multilingual transformer-based language models to achieve superior performances. The proposed approach is evaluated using the character error rate (CER) metric and achieves state-of-the-art results on the Urdu UHWR dataset, with a CER of 1.1%. The experimental results demonstrate the effectiveness of the proposed approach, surpassing state of the-art baselines in Urdu handwriting digitization.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    社交媒体平台已经超越了文化和语言的界限,从而实现全球范围内的在线交流。然而,各种语言的广泛使用加剧了在线检测仇恨言论内容的挑战。尽管发布了多个自然语言处理(NLP)解决方案,实现了尖端的机器学习技术,数据的稀缺性,特别是标记数据,仍然是一个相当大的障碍,这进一步需要使用半监督方法以及生成人工智能(GenerativeAI)技术。本文介绍了一种创新的方法,结合了生成对抗网络(GAN)和预训练语言模型(PLM)的多语言半监督模型,更确切地说是mBERT和XLM-RoBERTa。我们的方法证明了其在检测印欧语言的仇恨言论和冒犯性语言方面的有效性(英语,德语,和印地语)当只使用HASOC2019数据集中20%的注释数据时,从而在每种多语言中呈现显著的高性能,零射交叉语,和单语训练场景。我们的研究提供了一个强大的基于mBERT的半监督GAN模型(SS-GAN-mBERT),其性能优于基于XLM-RoBERTa的模型(SS-GAN-XLM),并达到了平均F1分数9.23%的提升和准确度比基线半监督mBERT模型提高了5.75%。
    Social media platforms have surpassed cultural and linguistic boundaries, thus enabling online communication worldwide. However, the expanded use of various languages has intensified the challenge of online detection of hate speech content. Despite the release of multiple Natural Language Processing (NLP) solutions implementing cutting-edge machine learning techniques, the scarcity of data, especially labeled data, remains a considerable obstacle, which further requires the use of semisupervised approaches along with Generative Artificial Intelligence (Generative AI) techniques. This paper introduces an innovative approach, a multilingual semisupervised model combining Generative Adversarial Networks (GANs) and Pretrained Language Models (PLMs), more precisely mBERT and XLM-RoBERTa. Our approach proves its effectiveness in the detection of hate speech and offensive language in Indo-European languages (in English, German, and Hindi) when employing only 20% annotated data from the HASOC2019 dataset, thereby presenting significantly high performances in each of multilingual, zero-shot crosslingual, and monolingual training scenarios. Our study provides a robust mBERT-based semisupervised GAN model (SS-GAN-mBERT) that outperformed the XLM-RoBERTa-based model (SS-GAN-XLM) and reached an average F1 score boost of 9.23% and an accuracy increase of 5.75% over the baseline semisupervised mBERT model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号