Skip to main content

Leveraging explainable multi-scale features for fine-grained circRNA-miRNA interaction prediction

Abstract

Background

Circular RNAs (circRNAs) and microRNAs (miRNAs) interactions have essential implications in various biological processes and diseases. Computational science approaches have emerged as powerful tools for studying and predicting these intricate molecular interactions, garnering considerable attention. Current methods face two significant limitations: the lack of precise interpretable models and insufficient representation of homogeneous and heterogeneous molecules.

Results

We propose a novel method, MFERL, that addresses both limitations through multi-scale representation learning and an explainable fine-grained model for predicting circRNA-miRNA interactions (CMI). MFERL learns multi-scale representations by aggregating homogeneous node features and interacting with heterogeneous node features, as well as through novel dual-convolution attention mechanisms and contrastive learning to enhance features.

Conclusions

We utilize a manifold-based method to examine model performance in detail, revealing that MFERL exhibits robust generalization, robustness, and interpretability. Extensive experiments show that MFERL outperforms state-of-the-art models and offers a promising direction for understanding CMI intrinsic mechanisms.

Background

An increasing number of studies have demonstrated that circRNAs and miRNAs interact through the competing endogenous RNA (ceRNA) network mechanism and exert their respective miRNA sponge functions through competition [1,2,3,4,5,6]. As research on circRNAs and miRNAs has advanced, a growing number of biological experiments have validated their interactions. For instance, Song et al. discovered that FOXO3a directly induced the expression of miR-29b-2 and miR-338 in breast cancer, suggesting their potential as therapeutic targets for this disease [7]. Zhou et al. found that the presence of miR-130a-5p in CircVAPA could enhance the migration and invasion capabilities of breast cancer cells [8].

The identification of circRNA-miRNA interaction (CMI) through traditional biological experiments is costly and time-consuming [9,10,11]. However, with the improvement of algorithms and the expansion of datasets, computational methods have emerged as a faster and more efficient approach for predicting CMI. Recently, many models for predicting cMI have been proposed [12,13,14,15]. Currently, computational approaches for CMI prediction can be categorized into three main groups.

The first category consists of traditional machine learning methods that utilize principles of matrix completion. Qian et al. proposed a computational framework, CMIVGSD [16], for predicting CMI by employing a singular value decomposition algorithm to extract linear features from matrices and using a Light-GBM classifier for prediction. Lan et al. introduced the NECMA model [17], which predicts CMI through inner product and neighborhood regularization logic matrix factorization. Yao et al. proposed the IIMCCMA model [18], which utilizes NetMF to extract latent feature vector representations based on similarity, followed by an inductive matrix completion algorithm to identify potential CMI. These methods primarily rely on known association data for feature extraction and prediction but overlook essential RNA sequence information, potentially compromising both prediction accuracy and comprehensiveness. The second category is based on the principle of ensemble learning. Qian et al. proposed a computational model called CMASG, which utilizes graph neural networks and singular value decomposition for CMI prediction [19]. In parallel, Ma et al. introduced a novel deep learning algorithm for CMI prediction that integrates Node2vec, graph attention networks, a conditional random field layer, and inductive matrix completion [20]. Guo et al. presented a new model called BGF-CMAP that combines gradient boosting decision trees with natural language processing and graph embedding techniques to infer potential CMI [21]. While some models employ natural language processing to extract RNA sequence features, others derive network features for prediction; however, they neglect the aggregation of neighbor information by only aggregating information within the network constructed from known interactions, ignoring the similarity between nodes of the same type. The third category is based on deep learning methods that utilize neural networks to learn node features. Guo et al. proposed a model called WSCD, which extracts attribute features from circRNAs (miRNAs) sequences and behavior features from CMA networks [22]. In the same year, Yu et al. introduced a computational model (SGCNCMI) that identifies circRNA-miRNA interactions by integrating multimodal information with graph convolutional networks [23]. The JSNDCMI model developed by Wang et al. integrates functional similarity and local topological features of nodes, strengthening feature representation with a denoising autoencoder [24]. The CA-CMA model introduced combines natural language features with interaction features, fine-tuning network parameters using labeled samples, and predicting CMI with a deep neural network classifier [25]. However, these methods have certain limitations: during feature fusion, they may encounter feature redundancy or loss while aggregating neighborhood information, they often only aggregate either heterogeneous or homogeneous neighbor information without considering the interactive learning of heterogeneous information.

In summary, the existing CMI prediction methods suffer from several limitations: (i) insufficient consideration of feature information and neglect of the importance of different features; (ii) sole reliance on association network features without aggregating homogeneous neighbor information; (iii) lack of addressing heterogeneous information interaction; (iv) potential redundancy or loss due to feature aggregation. To address these issues, we propose MFERL, a method that utilizes explainable multi-scale features for precise circRNA-miRNA interaction prediction. Specifically, MFERL offers the following advantages:

  • To account for the diverse feature information of RNA sequences, we extracted multiple feature representations of RNA from different fine-grained sequence dimensions. Simultaneously, to adjust and balance the various features, we applied enhanced learning to the different features. During the process of feature aggregation learning, we separately considered homogeneous information aggregation as well as heterogeneous information interactive learning.

  • To reduce the likelihood of feature redundancy or loss, we employed contrastive learning to optimize the feature vector representations, thereby obtaining high-quality feature embeddings. Ultimately, to enhance the information contained within node features, we concatenated the learned and aggregated feature representations from various perspectives as the ultimate node embeddings for prediction.

  • To validate the interpretability of the model, we used t-SNE [26] and MDA [27] for visualization analysis. Simultaneously, to investigate the robustness of the model, we conducted experiments under different positive-to-negative sample ratios.

Fig. 1
figure 1

The architecture of MFERL. Part I: multi-scale features extraction of miRNAs/circRNAs; part II: feature learning of miRNAs/circRNAs; Part III: model optimization and prediction

Results

Model design and training

The overall architecture of the proposed method is illustrated in Fig. 1. MFERL consists of three parts. Part I: multi-scale features extraction of miRNAs/circRNAs. Part II: feature learning of miRNAs/circRNAs. Feature aggregation and enhancement learning are performed on these five types of features from different perspectives: (1) homogeneous information aggregation: the similarity matrix is thresholded to construct a homogeneous graph, and GCN is applied to aggregate homogeneous neighbor information for the other four features on this graph. (2) Heterogeneous information interaction learning: the five features are concatenated and stretched as RNA feature representations, which are then input into a dual-attention module for heterogeneous information interaction learning. (3) Enhanced learning between features: the five features are fed into a dual-convolutional attention module to enhance learning across the different types of features. Part III: model optimization and prediction. To compare the high-dimensional embeddings of RNA after processing by the model, the embeddings obtained from the three different perspectives are combined with the original features, and contrastive learning is applied using a contrastive loss function to optimize the vector representations. Finally, the three sets of feature vectors are concatenated to form the final feature embedding used for prediction.

The main aim of our study was to develop a predictor for circRNA-miRNA interaction scores. Our model evaluation was conducted using three datasets, where known CMI were treated as positive samples, labeled as 1, and unverified CMI were considered negative samples, labeled as 0. Then, from the samples labeled as 0, we randomly selected an equal number of negative samples to match the number of positive samples, ensuring sample balance. In the 5-fold CV, all positive samples and the selected negative samples were randomly divided into five equal parts, with four parts used for training and the remaining part used for testing. To assess the model’s performance, we utilized six common metrics, including AUC (area under the receiver operating characteristic curve), AUPR (area under the precision-recall curve), accuracy, recall, precision, and F1_score. The AUC and AUPR results for Dataset1 in the 5-fold CV are shown in Fig. 2A and B, respectively.

In addition, to verify the universality of the model, we also conducted independent tests on three datasets, as shown in Section 1 and Table S1 in Additional file 1. We also conducted five-fold cross-validation experiments on circRNAs and miRNAs on the three datasets, as shown in Section 2 and Figs. S1, S2, and Table S2 in Additional file 1 [28, 29].

In MFERL, there are three significant hyperparameters considered: di (embedding dimension), lr (learning rate), and \(\tau\) (temperature hyperparameter in contrastive learning). A series of experiments were conducted on Dataset1 using different hyperparameters to evaluate the sensitivity of the model to these parameters. We held other parameters constant while varying the embedding dimension di(32, 64, 128, 256). Using 5-fold cross-validation, we measured and visualized the AUC, AUPR, F1_score, accuracy, recall, and precision, presenting the results in a heatmap as shown in Fig. 4A. Notably, we observed an improvement in MFERL’s performance as the embedding dimension increased, with the best AUC and AUPR observed at a dimension of 128. However, the performance started to decline when the dimension reached 256, leading us to select 128 as the optimal value. Next, by adjusting the learning rate, we aimed to optimize the model’s predictive capability. Therefore, we varied the learning rate lr(0.0001, 0.001, 0.01, 0.1) and conducted 5-fold cross-validation. As shown in Fig. 4B, the optimal performance was achieved when the learning rate was set to 0.01. To obtain high-quality feature embeddings, we introduced contrastive learning to optimize feature vector representations. Consequently, the temperature parameter \(\tau\) in contrastive learning was treated as a hyperparameter, with values ranging from 0.01 to 0.15. As depicted in Fig. 4C, the best performance was achieved when \(\tau\) was set to 0.1.

Fig. 2
figure 2

Performance and statistical analysis of the MFERL model. A and B ROC and PR curves of MFERL on Dataset1 under 5-fold CV. C Statistical significance analysis of Cohen’s value between MFERL and 10 methods across three datasets. D MFERL outperforms other methods (the paired t-test, \(P < 0.0001\))

Fig. 3
figure 3

ROC and PR curves for eleven methods under 5-fold CV. a and b are on Dataset1; c and d are on Dataset2; e and f are on Dataset3

Fig. 4
figure 4

Evaluation indicators for parameter sensitivity analysis and ablation experiments. A, B, and C The performance of 5-fold CV is compared on Dataset1 with different parameter values. D, E, and F Ablation experiments of MFERL on Dataset1, Dataset2, and Dataset3

The proposed MFERL outperforms the state-of-the-art methods

To validate the performance of our model, we compared MFERL with the following methods across three datasets: BGF-CMAP [21], CA-CMA [25], GCNCMI [12], SPBCMI [30], NECMA [17], SPGNN [31], GCNA-MDA [32], NGMDA [33], MGCAT [34], and AMHMDA [35]. All comparisons were conducted under identical experimental settings, with the parameters for the comparative methods set to the optimal values recommended in their respective original studies.

Figure 3 illustrates the ROC and PR curves for the ten comparative methods and MFERL, evaluated through 5-fold CV on three datasets. Additionally, we employed Cohen’s value to assess the statistical differences in AUC and AUPR between MFERL and the other methods across the three datasets. A Cohen’s value greater than 0.8 indicates a substantial difference. The results in Fig. 2C demonstrate that the Cohen’s value for MFERL compared to other methods exceeds 0.8, leading to the conclusion that there is a significant difference between MFERL and the other ten methods across the different datasets.

Table 1 Performance of eleven methods in terms of AUC, AUPR, F1_score, accuracy, recall, and precision under 5-fold CV on Dataset1, Dataset2, and Dataset3
Table 2 Comparison of different positive and negative sample ratios during MFERL training

Table 1 comprehensively summarizes our experimental analysis, highlighting the superior performance of our proposed model, MFERL, across six key evaluation metrics. Notably, on Dataset1, MFERL achieved impressive performance metrics, with an AUC of 0.9669, AUPR of 0.9629, F1_score of 0.9177, accuracy of 0.9170, recall of 0.9262, and precision of 0.9094. These values show a significant improvement over the corresponding metrics achieved by the second-best method, exceeding them by 1.82%, 1.69%, 4.01%, 3.97%, 2.22%, and 0.56%, respectively. On Dataset2 and Dataset3, MFERL also consistently ranked either first or second in performance. Additionally, we evaluated MFERL’s performance against the ten comparative methods on Dataset1 using ten rounds of 5-fold CV, collecting 50 AUC values for each method. The paired t-test was then conducted to statistically compare the AUC and AUPR values of MFERL against those of the ten comparative methods, highlighting the significant differences between MFERL and the other approaches. As shown in Fig. 2D, the results clearly indicate the superior effectiveness of our method compared to the others.

Ablation study of the feature learning

To verify the importance of feature learning from different perspectives in MFERL, we conducted comparative experiments on five variant models using Dataset1 in this section. The five variant models are introduced as follows:

  • “w/o cl” removed the contrastive learning component from the original model.

  • “w/o homo” indicated that the model did not consider the aggregation of homogeneous neighbor information.

  • “w/o mcam” removed the multi-feature enhancement learning module from the original model.

  • “w/o heter” indicated that the model lacked the heterogeneous information interaction learning module.

  • “change_homo” represented the model where homogeneous information aggregation was performed by first fusing and then aggregating homogeneous neighbor information.

Figure 4D, E, and F present the comparison of evaluation metrics between the original model and the five variant models on three datasets. It is noteworthy that we found the multi-feature enhancement learning module had a significant impact on the model, highlighting the need to consider the interactions between different features when utilizing multi-scale features to enrich node information.

Explore the impact of the ratio of positive and negative samples in training data on model performance

In comparative experiments conducted on three datasets, it was found that MFERL performed well across all three, demonstrating the model’s robustness and generalization capability. In real-world scenarios, the high cost of biological experiments often results in a limited number of verified CMI, leading to an imbalance between positive and negative samples. To further validate the model’s generalization ability and robustness under different positive-to-negative sample ratios, we conducted 5-fold CV experiments on Dataset1 with varying positive-to-negative sample ratios (1:1, 1:2, 1:5, 1:10). As shown in Table 2, the changes in AUC for MFERL were minimal, and the accuracy remained above 0.91. Notably, the AUPR, which is most sensitive to the positive-to-negative sample ratio, remains above 0.87, demonstrating the model’s strong generalization capability.

Fig. 5
figure 5

Visualization analysis and explainability. A t-SNE visualization of circRNA-miRNA pairs embeddings learned at different epochs of MFERL training. B Visualization of feature-enhanced learning using MDA in dual-convolutional attention modules

Visualization analysis and explainability

To visually demonstrate the model’s capability in learning features, we employed t-SNE on Dataset1 to transform the embeddings of circRNA-miRNA pairs learned by our model into a two-dimensional space. As shown in Fig. 5A, it can be observed that as the number of epochs increased, the positive samples (in red) and negative samples (in blue) were gradually distinguished. When the epoch reached 800, the resulting embeddings exhibited good intra-class similarity and a clear boundary between the classes. This result indicates that the model’s feature learning is both distinguishable and interpretable, thereby demonstrating the effectiveness of MFERL.

Through the ablation experiments conducted in the previous section, it was evident that the dual-convolution attention module used for multi-feature enhancement learning had a significant impact on model performance. To further explore the feature enhancement learning capability of this module, we conducted a visualization experiment using MDA. Specifically, MDA can display the arrangement of node features in low-dimensional space, where a more continuous color distribution indicates better preservation of the geometric relationships within the feature space. Additionally, MDA can analyze the influence of specific layers on specific behaviors. Therefore, we used MDA to analyze the impact of the dual-convolution attention module on feature enhancement learning behavior. As shown in Fig. 5B, it can be observed that as the training epochs increased, the distribution of the manifold structure in the visualization became more orderly, and the color patterns displayed a trend of clustering and gradual transition, while maintaining a continuous and uniform shape. This indicates the effectiveness of the dual-convolution attention module in feature enhancement learning, further demonstrating that the MFERL model possesses a certain degree of interpretability.

Table 3 The top 20 prediction scores among unknown interactions

Case validation based on experimental results in the literature

To evaluate MFERL’s capability in predicting circRNA-miRNA interaction pairs, we conducted a case study based on Dataset1. In this study, we trained the model using known interactions and an equal number of negative samples, then applied the trained model to predict unknown CMI. Among the top 20 predictions supported by experimental data, 14 CMI pairs were verified, as detailed in Table 3. We provide the top 50 case analysis and prediction scores in Additional file 1 and show the distribution of some prediction scores. For details, see Section 3, Table S2, and Fig. S3 in Additional file 1.

The results demonstrate that MFERL has strong identification performance. It is important to note that unverified CMI predictions do not necessarily indicate errors. Given MFERL’s superior performance in comparative experiments and across three datasets, it can be inferred that the unverified CMI in this case study likely have a high probability of being accurate. Therefore, these predictions urgently require experimental validation.

Discussion

The growing importance of circRNA-miRNA interactions in disease mechanisms highlights the need for accurate and interpretable prediction models. In this study, we proposed MFERL, a novel multi-scale feature learning framework that integrates homogeneous and heterogeneous node learning to improve CMI prediction performance. While MFERL achieved competitive results, some challenges and open questions remain.

First, the inherent class imbalance in CMI datasets poses a significant challenge. Although MFERL maintains robust performance under different negative sampling ratios, the selection of negative samples remains a critical factor influencing model accuracy. Further research is needed to develop more effective negative sampling strategies, potentially incorporating biological priors or knowledge-based constraints. Second, the model currently emphasizes node-level features without fully considering biological context, such as tissue specificity, disease state, or dynamic regulatory environments. These factors can significantly impact the behavior and interaction patterns of circRNAs and miRNAs. Future work should explore integrating this contextual information, possibly through the construction of knowledge graphs or hypergraphs that capture richer biological relationships. Moreover, although visualization techniques like t-SNE and MDA confirm the model’s discriminative power, more advanced interpretability techniques could be applied to provide deeper insights into the learned features and the biological relevance of predictions. Specifically, while MDA helps visualize the organization of learned features and reveals their structural evolution during training, this interpretability remains at the computational level. It does not directly explain the biological mechanisms behind circRNA-miRNA interactions. This limitation stems from our model’s reliance on raw sequence data, without integrating explicit biological annotations or motif-level supervision. Nonetheless, the clustering patterns and gradual transitions observed in MDA suggest that the model may implicitly capture biologically relevant signals, such as nucleotide preferences or interaction-related patterns. In future work, we aim to incorporate attention weight analysis, motif discovery techniques, and comparisons with experimentally validated binding sites to better align computational representations with biological meaning. This would further support the model’s application in guiding experimental validation.

In conclusion, MFERL offers a solid foundation for CMI prediction, and with further enhancements—especially in data quality, context modeling, and sample selection—it holds strong potential for facilitating biological discovery in the era of RNA research.

Conclusions

In recent years, numerous clinical studies have demonstrated that circRNA-miRNA interaction plays a crucial role in disease development and treatment, drawing significant attention from researchers. In this paper, we propose a CMI prediction model, MFERL, based on multi-scale features learning. The model comprehensively considers features at different scales and employs feature learning from various perspectives. Specifically, we perform homogeneous node aggregation learning and heterogeneous node interaction learning, along with enhanced learning of multi-scale features. The results show that MFERL significantly outperforms other classic methods. Visualization analysis using t-SNE and MDA confirms that MFERL offers inter-class distinguishability and interpretability in feature learning. Moreover, by adjusting the ratio of positive and negative samples during training, we demonstrate the model’s strong generalization capability. Case studies further indicate that MFERL is a reliable tool for predicting potential CMI, providing valuable insights for biological experiments.

Although MFERL achieved superior performance in comparative experiments, there are still some limitations. For example, the imbalance between positive and negative samples in the dataset affects the model’s ability to accurately select negative samples during training. While this imbalance impacts performance to some extent, our model still demonstrates strong predictive capability, maintaining high precision (PPV) even when the negative sample ratio is increased. However, the varying characteristics of different entities in biological networks—such as the influence of cell types, disease states, or other contextual factors—can affect node feature information. Currently, our model primarily considers the features of target nodes, without incorporating these contextual influences. To address these limitations, we plan to integrate richer biological information (e.g., diseases [36], drugs [37, 38], cell types) in future work, and explore the construction of a comprehensive knowledge graph or a hypergraph to facilitate feature aggregation and improve prediction accuracy. Additionally, we will further investigate strategies for negative sample selection, aiming to develop more robust methods for model training in imbalanced datasets.

Methods

Datasets

MFERL will be tested on three datasets. Table 4 provides a summary of the detailed information for these datasets.

  • Dataset1: Circbank [39] is a public database containing five features of circRNAs. Circbank includes approximately 140,000 human circRNAs and 1917 human miRNAs. After removing redundant data, we obtained 9589 circRNA-miRNA interaction pairs from the Circbank database, involving 2115 circRNAs and 821 miRNAs.

  • Dataset2: CMI-9905 was compiled by Wang et al. [24], consisting of 9905 interaction pairs between 2346 circRNAs and 926 miRNAs.

  • Dataset3: It was obtained from BGF-CMA [21] and contains 20,208 experimentally validated CMI, involving 3569 circRNAs and 1152 miRNAs.

Overview of MFERL

The overall architecture of the proposed method is illustrated in Fig. 1. MFERL consists of three parts. Part I: multi-scale features extraction of miRNAs/circRNAs. We calculate sequence similarity, statistical features (CTD and K-mer), pre-trained distributed features (Doc2 Vec), and graph structural features (Role2 Vec) for circRNAs and miRNAs. Part II: feature learning of miRNAs/circRNAs. Feature aggregation and enhancement learning are performed on these five types of features from different perspectives. It includes three aspects: aggregation of homogeneous neighborhood features of multi-scale features, dual-convolutional attention module for enhanced learning between multi-scale features, and interactive learning of heterogeneous information of miRNAs and circRNAs. Part III: model optimization and prediction: contrastive learning optimizes vector representation, feature splicing enriches embedded information, and inner product obtains prediction results.

Table 4 Specific data of datasets

Multi-scale features extraction

To comprehensively characterize circRNAs and miRNAs, we extracted different fine-grained features from multiple perspectives. First, considering the previous work related to circRNAs [40], the Levenshtein distance method was applied to the RNA sequences to calculate the sequence similarity among the same type of RNA. Subsequently, drawing on existing research on interactions between ncRNAs [41], four types of features based on RNA sequences were extracted: statistical features (CTD and K-mer), pre-trained distributed features (Doc2 Vec), and graph structural features (Role2 Vec). The detailed process of feature extraction was then described.

Sequence similarity:

The evaluation of similarity between two circRNA sequences was based on the Levenshtein distance [42], which represents the minimum number of edit operations required to transform one circRNAs sequence into another. The edit operations included not only character substitutions but also the insertion and deletion of characters. Consequently, the sequence similarity for circRNAs was denoted as \({x}_{c}^{s}\), and similarly, for miRNA, it was denoted as \({x}_{m}^{s}\).

Composition/transition/distribution (CTD) features:

In this study, CTD features were employed to represent the sequence structural information of RNA. CTD features include nucleotide composition, nucleotide transition, and nucleotide distribution [43]. Currently, CTD features are rarely used for predicting interactions between circRNAs and miRNAs. Here, we utilized CTD features to supplement the structural information of RNA, denoted as \({x}_{c}^{c}\) and \({x}_{m}^{c}\), respectively.

K-mer features of RNA sequences:

K-mer is a widely used RNA sequence descriptor, which has been successfully applied in enhancer recognition [44] and lncRNA prediction [45]. In this study, we employed four K-mer features, including 1-mer, 2-mer, 3-mer, and 4-mer. For circRNAs (miRNAs) sequence, the four K-mer features were concatenated into a feature vector, with the K-mer features represented as \({x}_{c}^{k}\) and \({x}_{m}^{k}\).

Pre-trained distributed features by Doc2 Vec:

In this study, Doc2 Vec was utilized to obtain the distributed embeddings [46]. Each RNA sequence was treated as a sentence, and Doc2 Vec learned sentence representations by combining local context and global information. In this context, the distributed features for circRNAs (miRNAs) sequence were represented as \({x}_{c}^{d}\) and \({x}_{m}^{d}\), respectively.

Graph structural features by Role2 Vec:

Role2 Vec was employed to learn graph structural information by utilizing attributed random walks to capture role-based embeddings. Following the approach in [34], the Role2 Vec embedding method was used to encode nodes within the interaction graph. Similarly, we obtained vector representations for circRNAs, where the Role2 Vec features for circRNAs and miRNAs were represented as \({x}_{c}^{r}\) and \({x}_{m}^{r}\), respectively.

Feature learning of miRNAs/circRNAs

In MFERL, feature learning involved feature aggregation and enhancement learning of multi-scale features from different perspectives. This section encompassed three aspects: aggregation of homogeneous neighborhood features of multi-scale features, dual-convolutional attention module for enhanced learning between multi-scale features, and interactive learning of heterogeneous information of miRNAs and circRNAs.

Aggregation of homogeneous neighborhood features of multi-scale features

To aggregate different features of similar nodes, we opted to use a similarity matrix as a homogeneous graph and applied graph convolution to aggregate the features of homogeneous neighbors. Unlike traditional feature aggregation methods, we performed feature aggregation separately on the homogeneous graph for each feature type, and then fused the aggregated features by concatenation and projection. Taking miRNAs as an example: first, we constructed a homogeneous graph \({G}_{m}\) by applying a threshold on the similarity matrix \({x}_{m}^{s}\) (for fairness, we set the threshold at 0.5); then, on graph \({G}_{m}\), we used graph convolution to aggregate the CTD features, K-mer features, Doc2 Vec features, and Role2 Vec features (as the similarity matrix has already been used to construct the homogeneous graph, sequence similarity features were not considered during the aggregation of homogeneous neighbor information). The feature aggregation process for CTD features, as an example, is detailed as follows:

$$\begin{aligned} {X}_{ctd}^{(l+1)} = {f}_{conv} \left( {X}_{ctd}^{l}, {A} \right) \end{aligned}$$
(1)
$$\begin{aligned} {f}_{conv} \left( {X}_{ctd}^{l}, {A} \right) = \sigma \left( \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} {X}_{ctd}^{l} {W}_{ctd}^{l} \right) \end{aligned}$$
(2)

here, A is the adjacency matrix of graph \({G}_{m}\), \({X}_{ctd}^{0} = {x}_{c}^{m}\), \(\tilde{A} = {A}+{I}\) represents the adjacency matrix A of the node plus self-loop, and \(\tilde{D}\) is the degree matrix of \(\tilde{A}\), which is a symmetric matrix. Here, we set the number of layers l of GCN to 1 by default. Similarly, we can obtain homogeneous information features \({X}_{m}^{kmer}\), \({X}_{m}^{doc}\), and \({X}_{m}^{role}\) based on k-mer, doc2vec, and role2vec. Finally, the different homogeneous information features are spliced and projected as the homogeneous features of miRNAs and embedded into \({X}_{m}^{homo}\). Similarly, the homogeneous features of circRNAs can be embedded into \({X}_{c}^{homo}\).

Dual-convolutional attention module for enhanced learning between multi-scale features

To adjust and balance the weights of different features, we adopted a dual-convolution attention mechanism for enhanced learning of the various features. Taking miRNAs as an example, we first used the five-layer embedding \({X}_{m} = \left\{ {x}_{m}^{k}, {x}_{m}^{c}, {x}_{m}^{d}, {x}_{m}^{r}, {x}_{m}^{s} \right\}\) as the input to the channel convolution attention block (\({X}_{m} \in \mathbb {R}^{C_{m} \times N_{m} \times D_{m}}\)). This process sequentially produced the channel attention \(\partial _{mc} \in \mathbb {R}^{5 \times 1 \times 1}\) and spatial attention \(\partial _{mc} \in \mathbb {R}^{5 \times N_{m} \times D_{m}}\), where \(C_{m}\) represents the number of channels or embedding layers (default is 5), \(N_{m}\) is the number of miRNAs nodes, and \(D_{m}\) is the embedding dimension. The overall attention process can be summarized as follows:

$$\begin{aligned} {X}'_{m} = \partial _{mc} \otimes {X}_{m} \end{aligned}$$
(3)
$$\begin{aligned} {X}_{m}^{mcam} = \partial _{ms} \otimes {X}'_{m} \end{aligned}$$
(4)

In this context, \(\otimes\) denotes element-wise multiplication. During the multiplication process, the attention values are broadcasted accordingly, where \({X}_{m}^{mcam}\) represents the embeddings for miRNAs. Similarly, the embeddings for circRNAs can be obtained as \({X}_{c}^{mcam}\). The detailed descriptions of the channel attention and spatial attention modules are provided below, as shown in Fig. 6.

Fig. 6
figure 6

Dual-convolutional attention module (miRNAs as an example)

Channel attention module:

We utilized five different features as five layers of feature embeddings input into the channel attention block. Each feature embedding layer was considered as a distinct representation of miRNAs. To capture the influence of different layers on the overall embedding, we used convolutional layers to compress and restore the input embeddings. Additionally, to aggregate attention information, both average pooling and max pooling were employed to achieve more accurate channel attention. The process of the channel attention module is represented as follows:

$$\begin{aligned} \partial _{mc} = \sigma \left( {f}_{channel}\left( {X}_{m}^{max} \right) + {f}_{channel}\left( {X}_{m}^{avg} \right) \right) \end{aligned}$$
(5)

here, \(\sigma\) represents the sigmoid function, \({f}_{channel}(\cdot ) = {Conv2d}({ReLu}({Conv2d}(\cdot )))\).

Spatial attention module:

We generated a spatial attention map by leveraging the spatial relationships within the feature embeddings. The spatial attention block aimed to enhance channel attention by focusing more on specific parts of the information within the channels. To compute the spatial attention, average pooling and max pooling were applied along the channel axis to the input, and the results were concatenated. A convolutional layer was then applied to generate the spatial attention, which represents the regions where attention should be enhanced or diminished. The process of the spatial attention block is represented as follows:

$$\begin{aligned} \partial _{ms} = \sigma \left( {f}_{spatial} \left( \left[ {X}'_{mAvg} ; {X}'_{mMax} \right] \right) \right) \end{aligned}$$
(6)

here, \(\sigma\) represents the sigmoid function and \({f}_{spatial}\) represents a convolution operation with a filter size of \(3 \times 3\).

Interactive learning of heterogeneous information of miRNAs and circRNAs

To enhance the modeling of the complex relationships between the feature representations of miRNAs and circRNAs, we employed a bilinear interaction mechanism for heterogeneous information. This approach aids in extracting the joint representations of miRNAs and circRNAs. Specifically, we fused the five types of features as the miRNAs representation \({h}_{m}\) and the circRNAs representation \({h}_{c}\) to construct a bilinear interaction mapping, resulting in the interaction feature matrix \({Z} \in \mathbb {R}^{N_{m} \times N_{c}}\). The process is represented as follows:

$$\begin{aligned} \alpha =\text {norm} \left( {W}_{1} \left( \sigma \left( \left( {h}_{m}\right) ^{T} {W}_{m}\right) \otimes \sigma \left( \left( {W}_{c}\right) ^{T} {h}_{c}\right) \right) + {b}_{1}\right) \end{aligned}$$
(7)

here, \({W}_{m} \in \mathbb {R}^{N_{m} \times d}\) and \({W}_{c} \in \mathbb {R}^{N_{c} \times d}\) represent the learnable weight matrices for miRNAs and circRNAs, respectively. \(\text {norm}(\cdot )\) represents the weight normalization operation. \(\otimes\) represents the outer product, which is used to calculate the product of two vectors. \({W}_{1}\) is the weight matrix of the linear projection; \({b}_{1}\) is the bias term.

The bilinear interaction can be understood as mapping the miRNAs and circRNAs embedding vectors into a shared feature space using the weight matrix \({W}_{m}\) and \({W}_{c}\). Subsequently, these mapped representations undergo vector multiplication, resulting in a high-dimensional interaction feature matrix. Additionally, the interaction feature matrix is subjected to a linear projection operation, which maps the high-dimensional features into a low-dimensional representation space, generating the linear projection feature vectors \({X}_{m}^{heter}\) and \({X}_{c}^{heter}\). This process enables MFERL to effectively capture the nonlinear relationships between the input features, thereby enhancing its ability to model higher-order interactions between miRNAs and circRNAs features.

Model optimization and prediction

To capture higher-order relationships between nodes, we fully leveraged the advantages of contrastive learning and developed contrastive objectives to obtain high-quality feature embeddings. We treated the fused representation of the original features as the initial node embeddings \({e}_{m}^{initial}\) and \({X}_{c}^{initial}\). The embeddings \({X}_{m}^{homo}\) and \({X}_{c}^{homo}\), obtained by aggregating different features on the homogeneous graph via graph convolution, represented the homogeneous embeddings that incorporate information from homogeneous neighbors. The embeddings \({X}_{m}^{mcam}\) and \({X}_{c}^{mcam}\), obtained through dual-convolution enhanced learning of different features, represented the aggregated embeddings of the nodes. The embeddings \({X}_{m}^{heter}\) and \({X}_{c}^{heter}\), derived from bilinear interactions of heterogeneous neighbors, represented the heterogeneous embeddings. We utilized these embeddings to model the high-dimensional embedding relationships between RNAs. Specifically, we considered the embeddings of the nodes and those obtained from the aforementioned three feature extraction methods as positive pairs and used the InfoNCE loss function to minimize the distance between positive samples. Taking miRNAs as an example:

$$\begin{aligned} \ell ^{m} & =\text {InfoNCE}\left( {X}_{m}^{homo},{e}_{m}^{initial}\right) +\text {InfoNCE}\left( {X}_{m}^{mcam},{e}_{m}^{initial}\right) \nonumber \\ & \quad + \text {InfoNCE}\left( {X}_{m}^{heter},{e}_{m}^{initial}\right) \end{aligned}$$
(8)

here, \(\text {InfoNCE}(x, y) = \sum _{n \in N_m} -\log \frac{\exp \left( x_{m_n} \cdot y_{m_n} / \tau \right) }{\sum _{i \in N_m} \exp \left( x_{m_n} \cdot y_{m_i} / \tau \right) }\), \(\ell ^{m}\) is the local contrast loss of miRNAs, \(\tau\) is the temperature hyperparameter of softmax. Similarly, the local contrast loss \(\ell ^{c}\) of circRNAs can be obtained. The final local contrast target loss is the weighted sum of \(\ell ^{m}_{local}\) and \(\ell ^{c}_{local}\) as follows:

$$\begin{aligned} \ell _{cl} = \ell ^{m}_{local} + \alpha \ell ^{c}_{local} \end{aligned}$$
(9)

here, \(\alpha\) is the weight parameter used to balance \(\ell ^{m}_{local}\) and \(\ell ^{c}_{local}\), default is 1.

After obtaining the node representations aggregated from different features and perspectives, we considered that the diverse node information could enrich the node feature information and contribute to the prediction results. Therefore, we concatenated the different feature representations to obtain the final node representation:

$$\begin{aligned} {M}_{i} = \text {CNN}\left( \text {concatenate}\left( {X}_{{m}_{i}}^{homo}, {X}_{{m}_{i}}^{mcam}, {X}_{{m}_{i}}^{heter} \right) \right) \end{aligned}$$
(10)
$$\begin{aligned} {C}_{j} = \text {CNN}\left( \text {concatenate}\left( {X}_{{c}_{j}}^{homo}, {X}_{{c}_{j}}^{mcam}, {X}_{{c}_{j}}^{heter} \right) \right) \end{aligned}$$
(11)

here, \(\text {CNN}(\cdot )\) is a one-dimensional CNN. After that, we calculated the element-wise product of miRNAs node embedding and circRNAs node embedding. Then, we predicted the probability of interaction of cicRNA-miRNA pairs through FNN:

$$\begin{aligned} \hat{{r}}_{{i}{j}} = \text {FNN}\left( {M}_{i} \odot {C}_{j} \right) \end{aligned}$$
(12)

here, \(\odot\) is the element-wise product of the miRNAs node vector and the circRNAs node vector. \(\text {FNN}(\cdot )\) is a single-layer FNN whose output is activated by the Sigmoid activation function.

To optimize the model, we applied the cross-entropy loss function during model training to calculate the node classification loss:

$$\begin{aligned} \ell = - \sum \limits _{i,j \in \mathcal {Y} \cup \mathcal {Y}^+} \left[ {r}_{{i}{j}} \log \hat{{r}}_{{i}{j}} + (1 - {r}_{{i}{j}}) \log (1 - \hat{{r}}_{{i}{j}}) \right] \end{aligned}$$
(13)

here, \({r}_{{i}{j}}\) indicates node label, \(\hat{{r}}_{{i}{j}}\) represents the prediction score, and introduced a contrastive learning objective as an auxiliary task. The final loss of the entire model is formulated as follows:

$$\begin{aligned} \text {loss} = \ell + \lambda _1 \ell _{cl} + \Vert \theta \Vert _2^2 \end{aligned}$$
(14)

here, \(\lambda _1\) is a hyperparameter that balances the weight of the loss function. \(\theta\) is the L2 regularization parameter.

Data availability

All data generated or analyzed during this study are included in this published article, its supplementary information files, and publicly available repositories. The code and datasets of MFERL are freely available at the repository Zenodo (https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.15265950) [47] and Github (https://github.com/biohnuster/MFERL) [48].

Abbreviations

circRNAs:

Circular RNAs

miRNAs:

microRNAs

CMI:

circRNA-miRNA interaction

AUR:

Area under the receiver operating characteristic curve

AUPR:

Area under the precision-recall curve

di:

Embedding dimension

lr:

Learning rate

\(\tau\) :

Temperature hyperparameter in contrastive learning

References

  1. Chen W, Xu J, Wu Y, Liang B, Yan M, Sun C, et al. The potential role and mechanism of circRNA/miRNA axis in cholesterol synthesis. Int J Biol Sci. 2023;19(9):2879.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Hill M, Tran N. miRNA interplay: mechanisms and consequences in cancer. Dis Model Mech. 2021;14(4):dmm047662.

  3. Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell. 2011;146(3):353–8.

  4. Dakal TC, Kumar A, Maurya PK. CircRNA-miRNA-mRNA interactome analysis in endometrial cancer. J Biomol Struct Dyn. 2025;43(3):1486-97.

  5. Costa MC, Cortez-Dias N, Gabriel A, De Sousa J, Fiuza M, Gallego J, et al. circRNA-miRNA cross-talk in the transition from paroxysmal to permanent atrial fibrillation. Int J Cardiol. 2019;290:134–7.

    Article  PubMed  Google Scholar 

  6. Costa MC, Kurc S, Drożdż A, Cortez-Dias N, Enguita FJ, et al. The circulating non-coding RNA landscape for biomarker research: lessons and prospects from cardiovascular diseases. Acta Pharmacol Sin. 2018;39(7):1085–99.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Song Y, Zeng S, Zheng G, Chen D, Li P, Yang M, et al. FOXO3a-driven miRNA signatures suppresses VEGF-A/NRP1 signaling and breast cancer metastasis. Oncogene. 2021;40(4):777–90.

    Article  CAS  PubMed  Google Scholar 

  8. Zhou Sy, Chen W, Yang Sj, Li J, Zhang Jy, Zhang Hd, et al. Circular RNA circVAPA regulates breast cancer cell migration and invasion via sponging miR-130a-5p. Epigenomics. 2020;12(4):303–17.

  9. Tong KL, Tan KE, Lim YY, Tien XY, Wong PF. CircRNA-miRNA interactions in atherogenesis. Mol Cell Biochem. 2022;477(12):2703–33.

    Article  CAS  PubMed  Google Scholar 

  10. Nishita-Hiresha V, Varsha R, Jayasuriya R, Ramkumar KM. The role of circRNA-miRNA-mRNA interaction network in endothelial dysfunction. Gene. 2023;851:146950.

    Article  CAS  PubMed  Google Scholar 

  11. Ma B, Wang S, Wu W, Shan P, Chen Y, Meng J, et al. Mechanisms of circRNA/lncRNA-miRNA interactions and applications in disease and drug research. BioMed Pharmacother. 2023;162:114672.

    Article  CAS  PubMed  Google Scholar 

  12. He J, Xiao P, Chen C, Zhu Z, Zhang J, Deng L. GCNCMI: a graph convolutional neural network approach for predicting circRNA-miRNA interactions. Front Genet. 2022;13:959701.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Yu CQ, Wang XF, Li LP, You ZH, Ren ZH, Chu P, et al. RBNE-CMI: an efficient method for predicting circRNA-miRNA interactions via multiattribute incomplete heterogeneous network embedding. J Chem Inf Model. 2024;64(18):7163–72.

    Article  CAS  PubMed  Google Scholar 

  14. Wang XF, Yu CQ, You ZH, Qiao Y, Li ZW, Huang WZ. An efficient circRNA-miRNA interaction prediction model by combining biological text mining and wavelet diffusion-based sparse network structure embedding. Comput Biol Med. 2023;165:107421.

    Article  CAS  PubMed  Google Scholar 

  15. Li YC, You ZH, Yu CQ, Wang L, Hu L, Hu PW, et al. DeepCMI: a graph-based model for accurate prediction of circRNA-miRNA interactions with multiple information. Brief Funct Genom. 2024;23(3):276–85.

    Article  CAS  Google Scholar 

  16. Qian Y, Zheng J, Zhang Z, Jiang Y, Zhang J, Deng L. CMIVGSD: circRNA-miRNA interaction prediction based on variational graph auto-encoder and singular value decomposition. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2021. pp. 205–210.

  17. Lan W, Zhu M, Chen Q, Chen J, Ye J, Liu J, et al. Prediction of circRNA-miRNA associations based on network embedding. Complexity. 2021;2021(1):6659695.

    Article  Google Scholar 

  18. Yao D, Nong L, Qin M, Wu S, Yao S. Identifying circRNA-miRNA interaction based on multi-biological interaction fusion. Front Microbiol. 2022;13:987930.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Qian Y, Zheng J, Jiang Y, Li S, Deng L. Prediction of circRNA-miRNA association using singular value decomposition and graph neural networks. IEEE/ACM Trans Comput Biol Bioinforma. 2022;20(6):3461–8.

    Article  Google Scholar 

  20. Ma Z, Kuang Z, Deng L. NGCICM: a novel deep learning-based method for predicting circRNA-miRNA interactions. IEEE/ACM Trans Comput Biol Bioinforma. 2023;20(5):3080–92.

    Article  CAS  Google Scholar 

  21. Guo LX, Wang L, You ZH, Yu CQ, Hu ML, Zhao BW, et al. Biolinguistic graph fusion model for circRNA–miRNA association prediction. Brief Bioinform. 2024;25(2):bbae058.

  22. Guo LX, You ZH, Wang L, Yu CQ, Zhao BW, Ren ZH, et al. A novel circRNA-miRNA association prediction model based on structural deep neural network embedding. Brief Bioinform. 2022;23(5):bbac391.

  23. Yu CQ, Wang XF, Li LP, You ZH, Huang WZ, Li YC, et al. SGCNCMI: a new model combining multi-modal information to predict circRNA-related miRNAs, diseases and genes. Biology. 2022;11(9):1350.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Wang XF, Yu CQ, You ZH, Li LP, Huang WZ, Ren ZH, et al. A feature extraction method based on noise reduction for circRNA-miRNA interaction prediction combining multi-structure features in the association networks. Brief Bioinform. 2023;24(3):bbad111.

  25. Guo LX, Wang L, You ZH, Yu CQ, Hu ML, Zhao BW, et al. Likelihood-based feature representation learning combined with neighborhood information for predicting circRNA–miRNA associations. Brief Bioinform. 2024;25(2):bbae020.

  26. Maaten LV, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(11):2579-605.

  27. Islam MT, Zhou Z, Ren H, Khuzani MB, Kapp D, Zou J, et al. Revealing hidden patterns in deep neural network feature space continuum via manifold learning. Nat Commun. 2023;14(1):8506.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Bai P, Miljković F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. Nat Mach Intel. 2023;5(2):126–36.

    Article  Google Scholar 

  29. Long S, Tang X, Si X, Kong T, Zhu Y, Wang C, et al. TriFusion enables accurate prediction of miRNA-disease association by a tri-channel fusion neural network. Commun Biol. 2024;7(1):1067.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zhou J, Wang X, Niu R, Shang X, Wen J. Predicting circRNA-miRNA interactions utilizing transformer-based RNA sequential learning and high-order proximity preserved embedding. Iscience. 2024;27(1):108592.

  31. Wang Z, Liang S, Liu S, Meng Z, Wang J, Liang S. Sequence pre-training-based graph neural network for predicting lncRNA-miRNA associations. Brief Bioinform. 2023;24(5):bbad317.

  32. Liao Q, Ye Y, Li Z, Chen H, Zhuo L. Prediction of miRNA-disease associations in microbes based on graph convolutional networks and autoencoders. Front Microbiol. 2023;14:1170559.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Xuan P, Gu J, Cui H, Wang S, Toshiya N, Liu C, et al. Multi-scale topology and position feature learning and relationship-aware graph reasoning for prediction of drug-related microbes. Bioinformatics. 2024;40(2):btae025.

  34. Li H, Wu B, Sun M, Ye Y, Zhu Z, Chen K. Multi-view graph neural network with cascaded attention for lncRNA-miRNA interaction prediction. Knowl-Based Syst. 2023;268:110492.

    Article  Google Scholar 

  35. Ning Q, Zhao Y, Gao J, Chen C, Li X, Li T, et al. AMHMDA: attention aware multi-view similarity networks and hypergraph learning for miRNA–disease associations identification. Brief Bioinform. 2023;24(2):bbad094.

  36. Peng L, Li H, Yuan S, Meng T, Chen Y, Fu X, Cao D. metaCDA: A Novel Framework for CircRNA-Driven Drug Discovery Utilizing Adaptive Aggregation and Meta-Knowledge Learning. J Chem Inf Model. 2025;65(4):2129-44. 

  37. Peng L, Yang C, Yang J, Tu Y, Yu Q, Li Z, et al. Drug repositioning via multi-view representation learning with heterogeneous graph neural network. IEEE J Biomed Health Inform. 2025;29(3):1668–79.

    Article  PubMed  Google Scholar 

  38. Peng L, Wang W, Yang C, Xiao W, Fu X, Chen Y. Dual-stream heterogeneous graph neural network based on zero-shot embeddings for predicting miRNA-drug sensitivity. In: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2024. pp. 1122–1128.

  39. Liu M, Wang Q, Shen J, Yang BB, Ding X. Circbank: a comprehensive database for circRNA with standard nomenclature. RNA Biol. 2019;16(7):899–905.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Peng L, Yang C, Chen Y, Liu W. Predicting CircRNA-disease associations via feature convolution learning with heterogeneous graph attention network. IEEE J Biomed Health Inform. 2023;27(6):3072–82.

    Article  PubMed  Google Scholar 

  41. Yang S, Wang Y, Lin Y, Shao D, He K, Huang L. LncMirNet: predicting LncRNA-miRNA interaction based on deep learning of ribonucleic acid sequences. Molecules. 2020;25(19):4372.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Levenshtein VI, et al. Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet physics doklady, vol. 10. Soviet Union; 1966. pp. 707–710.

  43. Tong X, Liu S. CPPred: coding potential prediction based on the global description of RNA sequence. Nucleic Acids Res. 2019;47(8):e43–e43.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lee D, Karchin R, Beer MA. Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011;21(12):2167–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Sun L, Luo H, Bu D, Zhao G, Yu K, Zhang C, et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 2013;41(17):e166–e166.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Lau JH, Baldwin T. An empirical evaluation of doc2vec with practical insights into document embedding generation. 2016. arXiv preprint arXiv:1607.05368.

  47. Peng L, Wang W. MFERL. Zenodo. 2025. https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.15265950

  48. Peng L, Wang W. MFERL. 2025. GitHub https://github.com/biohnuster/MFERL.

Download references

Acknowledgements

We would like to thank the National Natural Science Foundation of China (Nos. 62372158, 62402533), Natural Science Foundation of Hunan Province (No. 2023JJ30264), and Scientific Research Project of Hunan Education Department (Nos. 22A0350, 23B0237, 22C0267). We also appreciate the contributions of all authors involved in this study, whose collaboration made this research possible.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 62372158, 62402533), Natural Science Foundation of Hunan Province (No. 2023JJ30264), and Scientific Research Project of Hunan Education Department (Nos. 22A0350, 23B0237, 22C0267).

Author information

Authors and Affiliations

Authors

Contributions

L.P. and W.W. contributed to the initial draft and the design and implementation of the experiments; Z.Y. and X.F. were responsible for data collection and reference preparation; W.L. and D.C. provided experimental guidance and revised the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Li Peng or Xiangzheng Fu.

Ethics declarations

Ethics approval and consent to participate

This study involves computational experiments that are non-invasive and do not directly intervene with any human or animal subjects. Therefore, ethical approval from an institutional review board is not required. The consent of participants and the protection of personal information are not applicable, as the experimental data are sourced from publicly available datasets.

Consent for publication

All authors have provided their consent for publication of this study. There are no identifable individuals or personal data included in this manuscript, ensuring compliance with publication ethics.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

12915_2025_2227_MOESM1_ESM.pdf

Additional file 1: Table S1-S3 and Figure S1-S3. This document supplements the experiments: independent test experiments, 5-fold CV experiments of circRNAs, case study. Table S1: Performance of the MFERL model independently tested on three datasets. Table S2: Performance of the MFERL model under 5-fold CV for circRNAs and miRNAs on three datasets. Table S3: The top 50 prediction scores among unknown interactions. Figure S1: ROC and PR curves of the MFERL model under 5-fold CV for circRNAs.andare on Dataset1;andare on Dataset2;andare on Dataset3. Figure S2: ROC and PR curves of the MFERL model under 5-fold CV for miRNAs.andare on Dataset1;andare on Dataset2;andare on Dataset3. Figure S3: Sample score distribution.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, L., Wang, W., Yang, Z. et al. Leveraging explainable multi-scale features for fine-grained circRNA-miRNA interaction prediction. BMC Biol 23, 121 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12915-025-02227-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12915-025-02227-6

Keywords