{Reference Type}: Journal Article {Title}: MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training. {Author}: Qian Y;Li X;Wu J;Zhang Q; {Journal}: Comput Biol Chem {Volume}: 112 {Issue}: 0 {Year}: 2024 Jul 25 {Factor}: 3.737 {DOI}: 10.1016/j.compbiolchem.2024.108137 {Abstract}: BACKGROUND: Compound-protein interaction (CPI) prediction plays a crucial role in drug discovery and drug repositioning. Early researchers relied on time-consuming and labor-intensive wet laboratory experiments. However, the advent of deep learning has significantly accelerated this progress. Most existing deep learning methods utilize deep neural networks to extract compound features from sequences and graphs, either separately or in combination. Our team's previous research has demonstrated that compound images contain valuable information that can be leveraged for CPI task. However, there is a scarcity of multimodal methods that effectively combine sequence and image representations of compounds in CPI. Currently, the use of text-image pairs for contrastive language-image pre-training is a popular approach in the multimodal field. Further research is needed to explore how the integration of sequence and image representations can enhance the accuracy of CPI task.
RESULTS: This paper presents a novel method called MMCL-CPI, which encompasses two key highlights: 1) Firstly, we propose extracting compound features from two modalities: one-dimensional SMILES and two-dimensional images. This approach enables us to capture both sequence and spatial features, enhancing the prediction accuracy for CPI. Based on this, we design a novel multimodal model. 2) Secondly, we introduce a multimodal pre-training strategy that leverages comparative learning on a large-scale unlabeled dataset to establish the correspondence between SMILES string and compound's image. This pre-training approach significantly improves compound feature representations for downstream CPI task. Our method has shown competitive results on multiple datasets.