HNF-DDA: subgraph contrastive-driven transformer-style heterogeneous network embedding for drug–disease association prediction

Shang, Yifan; Wang, Zixu; Chen, Yangyang; Yang, Xinyu; Ren, Zhonghao; Zeng, Xiangxiang; Xu, Lei

doi:10.1186/s12915-025-02206-x

Research
Open access
Published: 16 April 2025

HNF-DDA: subgraph contrastive-driven transformer-style heterogeneous network embedding for drug–disease association prediction

Yifan Shang¹^na1,
Zixu Wang²^na1,
Yangyang Chen¹,
Xinyu Yang¹,
Zhonghao Ren¹,
Xiangxiang Zeng¹ &
…
Lei Xu³

BMC Biology volume 23, Article number: 101 (2025) Cite this article

587 Accesses
Metrics details

Abstract

Background

Drug–disease association (DDA) prediction aims to identify potential links between drugs and diseases, facilitating the discovery of new therapeutic potentials and reducing the cost and time associated with traditional drug development. However, existing DDA prediction methods often overlook the global relational information provided by other biological entities, and the complex association structure between drug diseases, limiting the potential correlations of drug and disease embeddings.

Results

In this study, we propose HNF-DDA, a subgraph contrastive-driven transformer-style heterogeneous network embedding model for DDA prediction. Specifically, HNF-DDA adopts all-pairs message passing strategy to capture the global structure of the network, fully integrating multi-omics information. HNF-DDA also proposes the concept of subgraph contrastive learning to capture the local structure of drug-disease subgraphs, learning the high-order semantic information of nodes. Experimental results on two benchmark datasets demonstrate that HNF-DDA outperforms several state-of-the-art methods. Additionally, it shows superior performance across different dataset splitting schemes, indicating HNF-DDA’s capability to generalize to novel drug and disease categories. Case studies for breast cancer and prostate cancer reveal that 9 out of the top 10 predicted candidate drugs for breast cancer and 8 out of the top 10 for prostate cancer have documented therapeutic effects.

Conclusions

HNF-DDA incorporates all-pairs message passing and subgraph capture strategies into heterogeneous network embedding, enabling effective learning of drug and disease representations enriched with heterogeneous information, while also demonstrating significant potential for applications in drug repositioning.

Background

The development of a small molecule drug from design to market approval typically takes an average of 15 years and approximately $2 billion in investment [1]. In addition to the high costs of research and development, the clinical trial phase for new drugs has a failure rate as high as 90% [2]. Consequently, the number of newly approved drugs is insufficient to meet the needs of treating increasingly complex diseases. In 2022, the FDA approved only 37 novel drugs [3, 4]. New strategies are urgently needed to reduce costs and shorten the development cycle to enhance drug discovery efficiency. Compared to traditional drug discovery methods, drug repositioning identifies new indications for already approved clinical drugs, thereby avoiding the complex and costly drug design process and the high failure rates associated with clinical trials. Drug repositioning significantly improves drug discovery efficiency [5,6,7,8,9,10,11] and has been successfully applied in treating diseases such as COVID-19 [12] and Alzheimer’s disease [13]. With the advancement of computer technology and the massive accumulation of biomedical data, many computational methods have been applied in the field of biomedicine [14,15,16,17,18,19,20], which computational virtual screening for new drug indications has gradually gained attention [21, 22]. Utilizing machine learning models to predict reliable potential drug–disease associations (DDAs) can substantially reduce the human and material costs associated with traditional experiments [23,24,25]. Therefore, computational methods for predicting DDAs have become crucial for accelerating drug discovery.

Computational methods for predicting DDAs can be categorized into two types: drug-based and disease-based methods and multi-source heterogeneous data-based methods [26]. The first type predicts potential DDAs by constructing a drug-disease bipartite network and leveraging known drug–disease association patterns [27,28,29,30,31]. For instance, NCH-DDA [27] employs single-neighborhood and multi-neighborhood feature extraction modules to extract critical features of drugs and diseases from both the drug-disease bipartite network and drug/disease similarity networks in parallel, utilizing contrastive learning to obtain common features. DRAGNN [28] uses a graph attention mechanism to obtain dynamically allocated attention coefficients for nodes, enhancing the effectiveness of information gathering for target nodes. However, these methods have a limitation: the mechanisms of drug action and disease pathology involve multiple biomolecules and signaling pathways. By focusing solely on the direct associations between drugs and diseases, these methods neglect the biological mechanisms involving other entities, such as proteins, in DDAs.

The second category, multi-source heterogeneous data-based methods, integrates data from various biological entities to capture potential associations between drugs and diseases. These methods can be divided into three types based on data integration strategies: path-based, network embedding-based, and knowledge graph embedding-based. Path-based methods use walk strategies, such as random walks, to generate node sequences that capture relationships between different types of nodes and edges, thereby learning the representations of drug and disease nodes [32,33,34]. For example, DREAMwalk [32] proposed a “semantic multilayer association induction” method, which uses random walks guided by semantic information to generate node sequences populated by drugs and diseases. FuHLDR [25] obtains low-order features based on graph convolutional networks and high-order features based on meta-paths, then integrates these high-order and low-order representations to determine a comprehensive representation of drugs and diseases. However, these meta-path-based features often rely on local information and have limited ability to extract higher-order structures, making it difficult to capture the complex interaction mechanisms between drugs and diseases. Network embedding-based methods construct a heterogeneous network containing various biological entities and then use graph representation learning techniques to capture the network structure and learn node feature representations [35,36,37,38,39]. For example, PSGCN [35] proposed an end-to-end specific partner drug repositioning method based on graph convolutional networks. DDAGDL [24] incorporates complex biological information into the topology of heterogeneous networks, effectively learning smooth representations of drugs and diseases through an attention mechanism. These methods use graph convolutional networks (GCN) or graph attention networks (GAT) to integrate information from neighboring nodes but overlooks the all-pairs message passing between nodes [40]. Knowledge graph embedding-based methods view associations in the knowledge graph as transformations from source entities to target entities [41,42,43]. For example, RotatE [41] models relationships between entities as rotations in the complex plane. Although knowledge graph embedding techniques can map entities and relationships in the graph into a low-dimensional vector space, this representation method may lose some structural and semantic information.

Considering the limitations of the existing methods, we propose a subgraph contrastive-driven transformer-style heterogeneous network embedding model (HNF-DDA) for DDA prediction (Fig. 1). First, we construct a heterogeneous network encompassing various biological entities and employ the attribute information of these entities to obtain initial node embeddings using a biological large language model. Second, to learn the embeddings of drugs and diseases, HNF-DDA employs an all-pairs message passing heterogeneous network embedding model to capture global signal transmission between any nodes. A subgraph capture strategy is proposed to extract high-order semantic structures within the heterogeneous network. Finally, an eXtreme Gradient Boosting (XGBoost) classifier [44, 45] is employed to predict the association probabilities between drugs and diseases. Experiments conducted on real-world datasets demonstrate that HNF-DDA outperforms existing methods in AUROC, AUPR, and Accuracy. Results from experiments with different dataset splitting schemes indicate that HNF-DDA has superior generalization capability for new drug and disease categories. Therefore, HNF-DDA not only effectively learns the representations of drugs and diseases that contain heterogeneous information but also shows greater potential for application in drug repositioning. This study makes the following contributions:

To obtain multi-source biological entity information, we employ a large-scale biological language model to generate initial embeddings for drug structures, protein sequences, diseases, and other biological entity attributes.
To achieve global information transmission in heterogeneous networks, we utilize an all-pairs message-passing Transformer-style network embedding model that simulates signal transmission between any nodes, enabling adaptive integration of various biological entity information.
To better capture the complex association mechanisms between drugs and diseases, we propose a drug-disease subgraph contrastive strategy that ensures better connections between drugs and diseases in the embedding space.
Experimental results demonstrate that HNF-DDA outperforms state-of-the-art methods. Additionally, split experiment results and case studies on breast cancer and prostate cancer confirm the model’s generalization and reliability, offering new insights for drug repositioning.

Results and discussions

Datasets

We evaluated our model on two benchmark datasets: KEGG [46] and HetioNet [34]. Both datasets contain drug, protein, disease, pathway entities and multi-type association information. The statistics of the two datasets are shown in Table 1.

Table 1 The statistics of KEGG and HetioNet

Full size table

Baselines

In this study, we compared HNF-DDA with 10 state-of-the-art DDA prediction methods:

RotatE [41]: This model introduces a new knowledge graph embedding method capable of modeling and inferring various relational patterns, including symmetric/antisymmetric, inversion, and composition, for learning drug and disease embeddings.
QuatE [47]: This method introduces a more expressive hypercomplex representation to model entities and relationships in knowledge graph embeddings, learning node embeddings.
WalkPool [48]: This algorithm combines the expressive power of topology-based heuristic algorithms with the feature learning capabilities of neural networks.
SEAL [49]: The model proposes a novel decaying heuristic theory that unifies a broad range of heuristic algorithms within a single framework. It demonstrates that all these heuristic algorithms can be well-approximated from local subgraphs, which retain rich information about the existence of links.
ComplEx [50]: This method demonstrates that using the asymmetric Hermitian product as a relational operation can automatically understand the structural knowledge of large knowledge bases and address the link prediction problem.
DTi2vec [51]: The model constructs a heterogeneous network and employs node embedding techniques to automatically generate features for each drug and target, subsequently using ensemble learning techniques to identify drug-target interactions.
NEWMIN [33]: This method proposes a network embedding framework within multiple networks to predict synergistic drug combinations.
DDAGDL [24]: This method incorporates complex biological information into the topology of heterogeneous networks, effectively learning smooth representations of drugs and diseases through an attention mechanism.
DREAMwalk [32]: This model proposes a “semantic multi-layer guilt-by-association” method, which predicts DDAs at the drug-gene-disease level using the relational guilt principle “similar genes share similar functions.”
FuHLDR [25]: This methods propose a novel graph representation learning model for drug repositioning by fusing higher and lower-order biological information.

The baselines we selected can be categorized into random walk-based, graph neural network-based, and knowledge graph-based link prediction models. The random walk-based models include FuHLDR, DREAMwalk, NEWMIN, and DTi2vec; the graph neural network-based models include DDAGDL, WalkPool, and SEAL; the knowledge graph embedding models, including CompIEx, RotatE, and QuatE.

Experimental setting and evaluation metrics

We use known DDAs as positive samples and randomly sample an equal number of negative drug-disease pairs as negative samples. Then, we evaluate the performance of HNF-DDA and other methods on the two datasets using tenfold cross-validation repeated 10 times, with different dataset splits for each experiment. Since the dimensions of the initial features of different biological entities are different, we first convert the initial embedding to the same size of the input embedding, set the size of the input embedding to 512, and set the size of the hidden layer embedding to 32; the number of layers for the all-pair message-passing encoder is 2 for the KEGG dataset and 1 for the HetioNet dataset; the weights of the learning objectives, $\alpha$ and $\beta$, are both 0.01 for KEGG, and 1.0 and 0.1 for HetioNet, respectively.

The evaluation metrics include AUROC, AUPR, and Accuracy.

$$TPR=Recall=\frac{TP}{TP+FN}$$

(1)

$$FPR=\frac{FP}{FP+TN}$$

(2)

$$Precision=\frac{TP}{TP+FP}$$

(3)

$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$

(4)

where TP is the number of samples correctly classified as positive, TN is the number of samples correctly classified as negative, FP is the number of samples incorrectly classified as positive, and FN is the number of samples incorrectly classified as negative. Accuracy is the proportion of all samples that are correctly predicted. AUROC is the Area Under the TPR-FPR Curve plotted at different thresholds, and AUPR is the Area Under the Precision-Recall Curve plotted at different thresholds. We comprehensively evaluate the performance of HNF-DDA using AUROC, AUPR, and Accuracy.

Performance comparison

To evaluate the performance of HNF-DDA, we compared it with baselines on two datasets. Figure 2 shows the results of HNF-DDA and other baselines using tenfold cross-validation 10 times. HNF-DDA achieved an average accuracy of 0.8897, AUROC of 0.9507, and AUPR of 0.9491 on both biomedical heterogeneous network datasets, outperforming the best-performing baselines, DREAMwalk (average accuracy of 0.8704, AUROC of 0.9382, AUPR of 0.9353), and FuHLDR (average accuracy of 0.8823, AUROC of 0.9462, AUPR of 0.9445). Therefore, HNF-DDA outperformed all compared baselines, achieving higher performance across all evaluation metrics. It is noteworthy that the KEGG dataset contains more drug, disease, and DDA data than the HetioNet dataset. HNF-DDA’s performance improvement over the DREAMwalk and FuHLDR model in KEGG (accuracy by 3.5% and 0.13%, AUROC by 2.2% and 0.185%, AUPR by 2.0% and 0.195%) and in HetioNet (accuracy by 0.8% and 1.59%, AUROC by 1.1% and 0.14%, AUPR by 0.9% and 0.13%).

The best baselines DREAMwalk and FuHLDR both generate meta-paths based on the idea of random walks, thereby capturing the topological information of nodes in the network. However, since random walks tend to frequently visit nodes that are close to each other, while the probability of visiting distant nodes is low, the captured topological structure is more biased towards locality. Moreover, random walks only rely on the topological structure of the network and cannot directly capture high-order semantics. The all-pair message-passing mechanism proposed by HNF-DDA can capture the potential relationship between any nodes and learn the global information of nodes in the network; the subgraph contrastive learning module proposed by HNF-DDA can capture the high-order semantic information of the drug-disease subgraph and learn the local information of nodes in the network; in addition, the individual attribute information of nodes is learned using a biological large language model. Therefore, HNF-DDA integrates multi-source heterogeneous information from multiple perspectives, captures the potential association between drugs and diseases, and improves the prediction performance of DDA.

Predictive potential for unknown drug/disease classes

To evaluate the effectiveness of the model in real-world drug repositioning scenarios, we compare HNF-DDA with DDAGDL, DREAMwalk, and FuHLDR model, the best-performing baselines, through DDA split prediction experiments on the KEGG dataset (as shown in Fig. 3). First, we classified all drug and disease entities: drugs are classified according to their ATC codes, and diseases are classified according to the highest MeSH (Medical Subject Headings) term category. Then, based on the categories of drugs or diseases, we divide the DDAs into train, validation, and test sets in an approximate ratio of 8:1:1. This forced the model to predict the DDA probability for unknown drug or disease categories. We repeated the division 10 times to ensure that the data sets differed in each split.

As shown in Fig. 3, DREAMwalk outperforms the DDAGDL and FuHLDR in the split experiment. DREAMwalk with average accuracy of 0.7818, AUROC of 0.8868, and AUPR of 0.8976. HNF-DDA with average accuracy of 0.7961, AUROC of 0.9014, and AUPR of 0.8935. In the disease split experiment, DREAMwalk with average accuracy of 0.6190, AUROC of 0.6900, and AUPR of 0.7009. HNF-DDA with average accuracy of 0.6648, AUROC of 0.7955, and AUPR of 0.7829. These results demonstrate that HNF-DDA has greater potential to accurately predict unknown drug or disease categories in real-world scenarios compared to DREAMwalk. Additionally, as shown in Fig. 3, the distribution of prediction results across 10 experiments indicates that HNF-DDA has better stability in prediction performance. Due to the insufficient number of DDAs in the HetioNet dataset, which makes it difficult to perform category-based split experiments, no such experiments are conducted.

Ablation experiments

To comprehensively validate the predictive performance of the HNF-DDA model, we conducted ablation experiments. We created the following variants targeting the heterogeneous network encoder and learning objective modules of the HNF-DDA model:

w/o link: Remove the edge-level learning objective.
w/o sub: Remove the subgraph-level learning objective.
w/o class: Remove the node-level learning objective.
w/o init_feat: Replace the initial embeddings learned by the large language model with one-hot encoding.
GAT: Replace the all-pair message passing encoder with a GAT.
GCN: Replace the all-pair message passing encoder with a GCN.

From the results in Fig. 4, HNF-DDA performs the best on the KEGG and HetioNet datasets. From the overall trend shown in Fig. 4, the changes in KEGG are relatively minor. As seen in Table 1, this is because the KEGG dataset contains a larger number of known DDA samples for training, which allows the model to fully learn the drug–disease association patterns. Therefore, the effects of different experimental conditions are minimal, and the performance metrics are higher than those of HetioNet.

Comparing the results of w/o link, w/o sub, w/o class, and w/o init_feat, it can be observed that the results of w/o sub and w/o init_feat are relatively worse. The w/o sub results indicate that the subgraph capture module proposed by HNF-DDA effectively mines the potential associations between drugs and diseases. The w/o init_feat results suggest that HNF-DDA effectively integrates semantic information of biological entities and heterogeneous network structure, and replacing the semantic features learned by the large language model may degrade the predictive performance. The comparison of GAT and GCN with HNF-DDA indicates that the all-pair message passing encoder used by HNF-DDA can effectively capture signals between any pair of nodes in the heterogeneous network, integrating multiple sources of heterogeneous information comprehensively and enhance the prediction performance. These experimental results demonstrate the effectiveness of the all-pair message passing and subgraph capture modules proposed by HNF-DDA.

Visualization of embeddings

We visualize the learned heterogeneous network node embeddings using T-sne [52]. Figure 5A and B show the visualization results of node embeddings on KEGG and HetioNet, respectively. Figure 5C and D show the visualization results of node embeddings on KEGG and HetioNet after removing the subgraph capture module. In Fig. 5A, drug clusters (red) are relatively close to disease clusters (blue) and relatively far from pathway clusters (yellow), in Fig. 5C, after removing the subgraph module, the pathway clusters (yellow) are situated between the drug clusters (red) and disease clusters (blue). Compared to Fig. 5B, D removing the subgraph module, drug clusters (red) are relatively close to pathway clusters (yellow) and relatively far from disease clusters (blue). These results indicate that the subgraph capture module can uncover potential associations between drugs and diseases, bringing them closer in the embedding space, which benefits the improvement of downstream DDA prediction performance.

Case study

To further validate the reliability of DDA predictions by HNF-DDA in drug repositioning, we conduct literature verification on candidate drugs for breast cancer and prostate cancer from the KEGG dataset. Firstly, we average the prediction scores of all DDAs obtained from tenfold cross-validation repeated 10 times, ensuring different data splits for each tenfold cross-validation. Then, we sort the predicted scores of all unknown DDAs. Finally, we select the top 10 candidate drugs with the highest predicted scores for breast cancer and prostate cancer for literature validation analysis. Table 2 lists candidate drugs and the corresponding literature reports.

Table 2 Top 10 candidates of HNF-DDA for breast cancer and prostate cancer

Full size table

As shown in Table 2, among the top 10 candidate drugs for breast cancer predicted by HNF-DDA, 9 have supporting literature evidence. For prostate cancer, 8 out of the top 10 candidate drugs have supporting literature evidence. Among the candidate drugs for breast cancer and prostate cancer, seven drugs overlap (including drugs for which there is no literature evidence). These drugs are either chemotherapy drugs that can treat various types of cancer (such as Etoposide and Vincristine sulfate) or can play a supportive role in managing cancer, or the symptoms and side effects related to its treatment. For example, Ephedrine is mainly used as a bronchodilator and decongestant and can sometimes be used in supportive care to manage low blood pressure during surgery or chemotherapy; Desmopressin is primarily used to treat diabetes insipidus and certain bleeding disorders but can also be used to manage bleeding complications in cancer patients; Prednisone, a corticosteroid, is used to treat various diseases, including inflammation, autoimmune diseases, and as part of certain chemotherapy regimens. The other three non-overlapping drugs also have specific related literature reports. These results further demonstrating the reliability of HNF-DDA in practical disease applications.

Discussion

HNF-DDA shows multiple improvements over existing SOTA models. It outperforms other models in prediction accuracy in different scenarios of both datasets, including predictions in new drug or disease categories. This study highlights the importance of using large language models and capturing both global and local structures of heterogeneous networks for DDA prediction. Although contrastive learning methods has shown creativity, the quality of negative samples limits the prediction performance.

Although we use a subgraph capture strategy to preserve the local structure of nodes and learn the high-level semantic information of nodes, the subgraph negative samples obtained through the random replacement strategy have relatively low interference and discriminability. Additionally, the classifier used in HNF-DDA is also trained on drug-disease negative pairs generated by random sampling, which may result in false negative pairs. Therefore, in our future work, we plan to investigate sampling strategies for negative samples to obtain more realistic and reliable negative samples for more accurate DDA prediction.

Conclusions

We propose HNF-DDA, a subgraph contrast-driven transFormer-based Heterogeneous Network embedding model for predicting drug–disease associations (DDAs). HNF-DDA utilizes an all-pair message passing strategy to preserve the global information of heterogeneous network nodes, enabling the integration of multi-omics data. It also proposes a subgraph capture module to retain the local structure of drug-disease subgraphs, learning the multiple high-level semantic information. Experimental results on two benchmark datasets demonstrate that HNF-DDA outperforms 10 state-of-the-art methods. Dataset split experiments reveal HNF-DDA’s potential in predicting DDAs for novel drug or disease categories. Ablation and visualization experiments indicate that the proposed all-pairs message passing and subgraph capturing strategies effectively reveal latent associations between drugs and diseases, enhancing DDA prediction performance. Finally, a literature validation analysis of the top 10 candidate drugs for breast and prostate cancer confirms the reliability of HNF-DDA in identifying candidate drugs. In summary, our model, HNF-DDA, offers a powerful tool for drug-disease prediction.

Methods

The main objective of this paper is to predict associations between drugs and diseases. We propose a subgraph contrastive-driven transFormer-style Heterogeneous Network embedding model, HNF-DDA, as illustrated in Fig. 1. First, we construct a biomedical heterogeneous network and utilize a biological language model to learn the initial embeddings of the heterogeneous network nodes. Next, we learn the embeddings of drugs and diseases using all-pairs message passing and subgraph capture strategies. Finally, we employ an XGBoost classifier to predict the association probabilities between drugs and diseases. Table 3 is a summary of all notations used in the “Methods” section.

Table 3 Summary of all notations

Full size table

Biomedical heterogeneous network

In this study, we construct a biomedical heterogeneous network using the interactions between biological entities. The nodes represent drugs, diseases, proteins, and other biological entities, while the edges represent the relationships between these entities. A schematic diagram of this network is shown in Fig. 1.

A biomedical heterogeneous network is defined as an undirected network $G=(V, E, \mathcal{A},\mathcal{R})$ and $N=|V|$ represent number of nodes. $V=\{{v}_{1}, {v}_{2}, \dots , {v}_{N}\}$ is the set of nodes in the network, where ${v}_{i}\in V$ represents the $i$ node in the network. $E=\{\left({v}_{i}, {v}_{j}\right)|{v}_{i}, {v}_{j}\in V\}\subseteq V\times V$ is the set of edges in the network, and each edge represents the interaction or association that exists between two nodes, where $\left({v}_{i}, {v}_{j}\right)$ represents the connection between nodes $i$ and $j$. $\mathcal{A}=\{{a}_{1}, {a}_{2}, \dots , {a}_{N}\}$ is the set of attributes of a node, including the SMILES structure of drugs, protein sequences, and biological text descriptions of diseases and other biological entities, where ${a}_{i}\in \mathcal{A}$ represents the attribute feature associated with node ${v}_{i}$. $\mathcal{R}$ is the set of type of a node, we describe the type of each node by a mapping function $\tau :V\to R$, namely: $\tau \left({v}_{i}\right)\in \{Drug, Protein, Disease, Others\}$, where $\tau \left({v}_{i}\right)$ represents the type of node ${v}_{i}$. Additionally, we removed all DDAs from the biomedical heterogeneous network, allowing the integration of network structure and biological entity semantic information during the heterogeneous network embedding process without relying on drug-disease treatment information. These DDAs will serve as supervisory information for the DDA prediction task using the XGBoost classifier.

Computing initial embeddings

The biomedical heterogeneous network contains various types of biological entities. We employ specific model for each type of entity based on their attributes to extract semantic information, which serves as the initial embeddings for the nodes in the heterogeneous network. This approach integrates external knowledge with the structure of the heterogeneous network. These models utilized to compute the initial embeddings are outlined below:

We utilize the SMILES as the attribute information for drugs. SMILES encodes the structure of a molecule into a string of characters, with each character representing information about atoms, bonds, and rings [92, 93]. This encoding provides a comprehensive description of the molecular structure, including the connections between atoms, ring structures, and stereochemistry. We employ a language model for drug molecules, MolFormer [94], to obtain embeddings from the drug’s SMILES. MolFormer employs masked language modeling and combines linear attention Transformers with rotary embeddings.
We utilize amino acid sequence data as the attribute information for proteins. This data comprises a sequence of characters that represent the specific amino acids constituting a protein and their sequential arrangement. Each amino acid is represented by a letter, and the sequence can reflect the protein’s structure, function, and activity. We employ a pre-trained protein model, ProtBert [95], to obtain initial embeddings from the protein sequence data. ProtBert is based on the BERT [96] architecture and encodes amino acid sequences into token-level or sentence-level representations, which can be used for downstream protein tasks, such as contact prediction.
We utilize biological text descriptions as attribute information for diseases and other biological entities. We employ a biomedical text language model, BiomedBERT [97], to obtain initial embeddings. BiomedBERT is based on the BERT architecture and is pre-trained from scratch using text abstracts from PubMed and full-text articles from PubMed Central as its corpus.

Finally, we obtain the initial embeddings ${H}^{init}=\{{H}_{t}^{init}\in {\mathbb{R}}^{\left|{V}_{t}\right|\times {d}_{t}}\}$, where $t\in \{Drug, Protein, Disease, Others\}$. Details of these models are in the Additional file 1.

Heterogeneous network embedding

This section introduces the heterogeneous network embedding (HNFormer) module of HNF-DDA. In this module, we employ a Transformer-style graph embedding method and design a subgraph capture strategy to learn the embeddings of heterogeneous network nodes.

All-pair message passing encoder

The mechanisms of drug action and disease pathology involve various types of biomolecules and signaling pathways. Additionally, the known edges in the biological heterogeneous network are incomplete, and many potential associations exist between nodes. Therefore, signal transmission in a heterogeneous network should not be limited to entities of the same type or local entity relationships. Inspired by NodeFormer [40], we employ an all-pairs message passing encoder to enable signal transmission between any pair of entities in the heterogeneous network, ensuring the full integration of multi-source heterogeneous information.

First, we utilize multiple MLPs to map the initial embeddings ${H}^{init}$ of different type node into the same space:

$${H}_{t}^{0}=\sigma \left({W}_{t}^{init}{H}_{t}^{init}+{B}_{t}^{init}\right),$$

(5)

where ${H}_{t}^{0}\in {\mathbb{R}}^{\left|{V}_{t}\right|\times d}$, $t\in \{Drug, Protein, Disease, Others\}$, and $\sigma (\bullet )$, ${W}_{t}^{init}$, ${B}_{t}^{init}$ represent Exponential Linear Units activation function, weight, bias parameter, respectively. We concatenate the embeddings of different types to form the complete node embeddings ${H}^{0}\in {\mathbb{R}}^{N\times d}$:

$${H}^{0}=\left[\begin{array}{c}\begin{array}{c}{H}_{Drug}^{0}\\ {H}_{Protein}^{0}\end{array}\\ \begin{array}{c}{H}_{Disease}^{0}\\ {H}_{Others}^{0}\end{array}\end{array}\right],$$

(6)

where ${z}_{u}^{0}\in {H}^{0}$ represents the $u$ th node representation vector in the 0 layer, and 0 represents initial representation vector of heterogeneous network.

Next, for any node $u$, we use ${z}_{u}^{(l)}$ to represent its corresponding representation vector at layer $l$. Thus, the update for the next layer ${z}_{u}^{(l+1)}$ is:

$${z}_{u}^{(l+1)}=\sum_{s=1}^{N}\frac{\text{exp}({({q}_{u})}^{T}{k}_{s})}{{\sum }_{w=1}^{N}\text{exp}({({q}_{u})}^{T}{k}_{w})}\bullet {v}_{s},$$

(7)

where ${k}_{u}={W}_{K}^{(l)}{z}_{u}^{(l)}$, ${q}_{u}={W}_{Q}^{(l)}{z}_{u}^{(l)}$, ${v}_{u}={W}_{V}^{(l)}{z}_{u}^{(l)}$ are obtained from the feature transformation of the $l$ th layer, and ${W}_{K}^{(l)}$, ${W}_{Q}^{(l)}$, and ${W}_{V}^{(l)}$ are learnable parameters in $l$ th layer. Equation (7) can be viewed as a graph attention mechanism defined on a fully connected graph where all nodes are pairwise connected.

Because for any node, it is necessary to calculate the attention of the other $N$ nodes separately. Therefore, using a kernel method approximate the exponential-then-dot operation, which is $\text{exp}\left({a}^{T}b\right)=\kappa (a,b)\approx \phi {\left(a\right)}^{T}\phi (b)$, where $\phi : {\mathbb{R}}^{d}\to {\mathbb{R}}^{m}$ is a low-dimensional feature map (RF). For example, the commonly used Positive Random Feature (PRF) [98] can be defined as:

$$\phi \left(x\right)=\frac{\text{exp}(\frac{-{\Vert x\Vert }_{2}^{2}}{2})}{\sqrt{m}}\left[\text{exp}\left({w}_{1}^{T}x\right),\dots ,\text{exp}\left({w}_{m}^{T}x\right)\right],$$

(8)

This enables us to rewrite Eq. (7):

$${z}_{u}^{(l+1)}=\sum_{s=1}^{N}\frac{\phi {\left({q}_{u}\right)}^{T}\phi ({k}_{s})}{{\sum }_{w=1}^{N}\phi {\left({q}_{u}\right)}^{T}\phi ({k}_{w})}\bullet {v}_{s}=\frac{\phi {\left({q}_{u}\right)}^{T}{\sum }_{s=1}^{N}\phi ({k}_{s}){v}_{s}^{T}}{{\left({q}_{u}\right)}^{T}{\sum }_{w=1}^{N}\phi ({k}_{w})}$$

(9)

In this way, only one computation is needed, the total complexity is kept within $O(N)$.

The above process assumes that each edge has a continuous attention weight. To further consider the “discretization” of edges, for any node $u$, need to find an “optimal” set of neighbors in each layer for information passing. Therefore, treating the attention weights generated by $N$ nodes as a categorical distribution and then sample the neighbor set from it. Specifically, replacing the Softmax in Eq. (7) with Gumbel-Softmax:

$${z}_{u}^{(l+1)}=\sum_{s=1}^{N}\frac{\text{exp}(({q}_{u}^{T}{k}_{u}+{g}_{s})/\mathcal{T})}{{\sum }_{w=1}^{N}\text{exp}(({q}_{u}^{T}{k}_{w}+{g}_{w})/\mathcal{T})}\bullet {v}_{s}, {g}_{u}\sim Gumbel\left(\text{0,1}\right),$$

(10)

Then, following the before approximate using RF with linear complexity:

$${z}_{u}^{(l+1)}\approx \sum_{s=1}^{N}\frac{\phi {\left({q}_{u}/\sqrt{\mathcal{T}}\right)}^{T}\phi {\left({k}_{s}/\sqrt{\mathcal{T}}\right)}^{{{e}{g}_{s/}}\mathcal{T}}}{{\sum }_{w=1}^{N}\phi {\left({q}_{u}/\sqrt{\mathcal{T}}\right)}^{T}\phi {\left({k}_{w}/\sqrt{\mathcal{T}}\right)}^{{{e}{g}_{w/}}\mathcal{T}}}\bullet {v}_{s}=\frac{\phi {\left({q}_{u}/\sqrt{\mathcal{T}}\right)}^{T}{\sum }_{s=1}^{N}{e}^{{g}_{s}/}\mathcal{T}\phi ({k}_{s}/\sqrt{\mathcal{T}}){v}_{s}^{T}}{{\left({q}_{u}/\sqrt{\mathcal{T}}\right)}^{T}{\sum }_{w=1}^{N}{e}^{{g}_{w}/}\mathcal{T}\phi ({k}_{w}/\sqrt{\mathcal{T}})}$$

(11)

In addition to considering the message passing between all node pairs in the network, the topology of the heterogeneous network itself contains a lot of useful information. During each layer of message passing, to strengthen the weights of the observed edges. Therefore, assigning a shared learnable weight to each edge, referred to as relational bias, and update the formula for each layer as follows:

$${z}_{u}^{(l+1)}\leftarrow {z}_{u}^{(l+1)}+\sum_{s,{a}_{us=1}}\sigma ({b}^{(l)})\bullet {v}_{s},$$

(12)

where ${b}^{(l)}$ is the learnable weight parameter corresponding to layer $l$, ${a}_{us=1}$ indicates that there is an association between nodes $u$ and $s$. We can obtain the last layer of node $\text{u}$ embeddings ${\text{z}}_{\text{u}}\in \text{H}$ based on all-pair message passing.

Finally, we employ an MLP to predict the labels of the nodes:

$$\widehat{Y}=MLP(H)$$

(13)

where $\widehat{Y}\in {\mathbb{R}}^{N\times \left|\mathcal{R}\right|}$ represent the predicted probability of the label.

Subgraph structure capture

Drugs act on multiple target proteins and participate in various functional pathways, working together to treat diseases. Therefore, in a heterogeneous network, the relationships between drugs, diseases, and other biological entities collectively form higher-order subgraph structures. In addition to considering signal transmission between nodes in the heterogeneous network, it is crucial to preserve the contextual semantic information contained in the high-order structures of the heterogeneous network. While existing methods often rely on meta-path approaches to explore high-order structures, they may fall short in capturing rich semantics and extracting high-order patterns [99]: (1) Meta paths often focus on single relationships, ignoring the multiple associations between different entities; (2) Starting from a source node, the number of nodes that a meta path can reach is too large, resulting in the extracted structure lacking restrictions and containing insufficient semantic information.

Inspired by CPT-HG [99], we recognize that the mechanisms of drug action and disease pathology involve various types of biomolecules and signaling pathways. Consequently, drugs, diseases, and other biological entities collectively form subgraph with intricate high-order structures. To address this, we construct positive and negative subgraphs and leverage contrastive learning to capture the intricate subgraph structures and subtle contextual semantic information within the heterogeneous network.

Specifically, given a drug (disease) node as the starting node $u$ and a disease (drug) node as the ending node $s$, we take the common first-order neighbors between the drug and the disease as the intermediate node set $C$. We construct subgraphs using only the first-order common neighbors of drugs and diseases as intermediate nodes, capturing the strong associations between drugs and diseases, and enhancing the structural constraints of the subgraph to include rich high-level semantic information. Therefore, the positive subgraph corresponding to node $u$ is:

$${M}_{u}^{+}=\{s\}\cup C$$

(14)

Then, we randomly replace half of the elements in the intermediate node set $C$ to obtain a new set ${C}^{-}$, thus we can obtain the negative subgraph for node $u$:

$${M}_{u}^{-}=\{s\}\cup {C}^{-}$$

(15)

Finally, we apply the concept of contrastive learning to ensure that node $u$ is closer to its positive subgraph embedding and farther from its negative subgraph embedding. The subgraph-level loss objective is defined as:

$${\mathcal{L}}_{sub}=-\frac{1}{\left|{V}^{dd}\right|}\sum_{u\in {V}^{dd}}\frac{\text{exp}\left({H}_{u}f\left({M}_{u}^{+}\right)\right)}{\text{exp}\left({H}_{u}f\left({M}_{u}^{+}\right)\right)+\text{exp}\left({H}_{u}f\left({M}_{u}^{-}\right)\right)},$$

(16)

where ${V}^{dd}$ is the set of drug and disease nodes, and $f(\bullet )$ denotes the pooling function (e.g., average pooling) that gets the subgraph embeddings.

Learning objectives

Given the node labels $Y$ and the predicted labels $\widehat{Y}$, the node-level supervised loss is defined as:

$${\mathcal{L}}_{n}=-\frac{1}{N}\sum_{v\in V}\sum_{r=1}^{\left|\mathcal{R}\right|}{\mathbb{I}}\left[{y}_{v}=r\right]log\widehat{{y}_{v,r}},$$

(17)

where ${\mathbb{I}}[\bullet ]$ is an indicator function. $\widehat{{y}_{v,r}}$ represent the probability that the $v$ th node belongs to the class $r$.

Treating the attention estimates of each layer in the model as a categorical distribution, with the observed edges as samples. Thus, we define an edge-level loss objective using maximum likelihood estimation:

$$\begin{array}{c}{\mathcal{L}}_{e}=-\frac{1}{NL}\sum_{l=1}^{L}\sum_{\left(u,s\right)\in E}\frac{1}{{d}_{u}}log{\pi }_{us}^{\left(l\right)}\\ {\pi }_{us}^{\left(l\right)}=\frac{\phi {\left({q}_{u}\right)}^{T}\phi ({k}_{s})}{\phi {\left({q}_{u}\right)}^{T}{\sum }_{w=1}^{N}\phi ({k}_{w})}\end{array},$$

(18)

where ${d}_{u}$ represents the in-degree of node $u$, and ${\pi }_{us}^{\left(l\right)}$ represents the predicted probability for edge $(u, s)$ at the $l$ th layer of the model.

The final objective is the sum of the node-level, edge-level, and subgraph-level loss:

$$\mathcal{L}={\mathcal{L}}_{n}+\alpha {\mathcal{L}}_{e}+\beta {\mathcal{L}}_{sub},$$

(19)

where $\alpha$ and $\beta$ are weight parameters.

Drug–disease association prediction

After obtaining the embeddings of the heterogeneous network nodes, we utilize the known DDAs as supervision information and predict DDA scores based on drug and disease embeddings using the XGBoost model. To enhance the stability of the prediction results, we conduct multiple independent training sessions with XGBoost and average the resulting prediction scores.

XGBoost (eXtreme Gradient Boosting) is a powerful and widely used machine learning algorithm, primarily designed for supervised learning tasks such as classification and regression [100]. XGBoost is an implementation of gradient boosting machines (GBM), which are a type of ensemble learning method. Ensemble learning methods combine multiple base learners (in this case, decision trees) to improve overall performance. Gradient boosting, specifically, builds models sequentially, where each new model attempts to correct the errors made by the previous models. XGBoost has gained popularity due to its high performance, speed, and scalability. Details of the training procedure and the parameters of XGBoost are in the Additional file 1.

Data availability

All data generated or analyzed during this study are included in this published article, its supplementary information files, and publicly available repositories. The Python and Torch implementation of the HNF-DDA model is accessible at https://doiorg.publicaciones.saludcastillayleon.es/https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.15117258 or https://github.com/ShangCS/HNF-DDA. The sources of all analyzed datasets are as follows: KEGG dataset can be downloaded from https://www.kegg.jp/kegg/rest/keggapi.html, HetioNet dataset can be downloaded from https://github.com/hetio/hetionet. Additionally, the dataset used in this study is available at Zenodo: https://doiorg.publicaciones.saludcastillayleon.es/https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.15117258. The specific data of Figs. 2, 3 and 4 can be found in Additional file 2.

Abbreviations

AUROC:: The area under the receiver operating characteristics curve
AUPR:: The area under the precision and recall curve
DDA:: Drug–disease association
GAT:: Graph attention networks
GCN:: Graph convolutional networks
XGBoost:: EXtreme Gradient Boosting

References

Sadybekov AV, Katritch V. Computational approaches streamlining drug discovery. Nature. 2023;616(7958):673–85.
Article CAS PubMed Google Scholar
Sun D, Gao W, Hu H, Zhou S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm Sin B. 2022;12(7):3049–62.
Article CAS PubMed PubMed Central Google Scholar
Mullard A. 2021 FDA approvals. Nat Rev Drug Discov. 2022;21(2):83–8.
Article CAS PubMed Google Scholar
Qi R, Zou Q. Trends and potential of machine learning and deep learning in drug study at single-cell level. Research. 2023;6:0050.
Article CAS PubMed PubMed Central Google Scholar
Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, et al. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41–58.
Article CAS PubMed Google Scholar
Jourdan J-P, Bureau R, Rochais C, Dallemagne P. Drug repositioning: a brief overview. J Pharm Pharmacol. 2020;72(9):1145–51.
Article CAS PubMed PubMed Central Google Scholar
Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3(8):673–83.
Article CAS PubMed Google Scholar
Ru X, Ye X, Sakurai T, Zou Q. NerLTR-DTA: drug–target binding affinity prediction based on neighbor relationship and learning to rank. Bioinformatics. 2022;38(7):1964–71.
Article CAS PubMed Google Scholar
Li H, Liu B. BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo. PLoS Comput Biol. 2023;19(6):e1011214.
Article CAS PubMed PubMed Central Google Scholar
Ai C, Yang H, Ding Y, Tang J, Guo F. Low rank matrix factorization algorithm based on multi-graph regularization for detecting drug-disease association. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(5):3033–43.
Article CAS PubMed Google Scholar
Zhao BW, Su XR, Hu PW, Huang YA, You ZH, Hu L. iGRLDTI: an improved graph representation learning method for predicting drug–target interactions over heterogeneous biological information network. Bioinformatics. 2023;39(8):btad451.
Article CAS PubMed PubMed Central Google Scholar
von Delft A, Hall MD, Kwong AD, Purcell LA, Saikatendu KS, Schmitz U, et al. Accelerating antiviral drug discovery: lessons from COVID-19. Nat Rev Drug Discov. 2023;22(7):585–603.
Article Google Scholar
Ballard C, Aarsland D, Cummings J, O’Brien J, Mills R, Molinuevo JL, et al. Drug repositioning and repurposing for Alzheimer disease. Nat Rev Neurol. 2020;16(12):661–73.
Article PubMed PubMed Central Google Scholar
Liu T, Qiao H, Wang Z, Yang X, Pan X, Yang Y, et al. CodLncScape provides a self-enriching framework for the systematic collection and exploration of coding LncRNAs. Adv Sci. 2024;11:2400009.
Article CAS Google Scholar
Ru X, Zou Q, Lin C. Optimization of drug–target affinity prediction methods through feature processing schemes. Bioinformatics. 2023;39(11):btad615.
Article CAS PubMed PubMed Central Google Scholar
Li H, Pang Y, Liu B. BioSeq-BLM: a platform for analyzing DNA, RNA, and protein sequences based on biological language models. Nucleic Acids Res. 2021;49(22):e129.
Article CAS PubMed PubMed Central Google Scholar
Li X, Ma S, Xu J, Tang J, He S, Guo F. TranSiam: Aggregating multi-modal visual features with locality for medical image segmentation. Expert Syst Appl. 2024;237:121574.
Article Google Scholar
Guo X, Huang Z, Ju F, Zhao C, Yu L. Highly accurate estimation of cell type abundance in bulk tissues based on single-cell reference and domain adaptive matching. Adv Sci. 2024;11(7):2306329.
Article CAS Google Scholar
Su R, Wu H, Xu B, Liu X, Wei L. Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(4):1231–9.
Article CAS PubMed Google Scholar
Su R, Liu X, Wei L, Zou Q. Deep-resp-forest: a deep forest model to predict anti-cancer drug response. Methods. 2019;166:91–102.
Article CAS PubMed Google Scholar
Luo H, Li M, Yang M, Wu F-X, Li Y, Wang J. Biomedical data and computational models for drug repositioning: a comprehensive review. Brief Bioinform. 2021;22(2):1604–19.
Article CAS PubMed Google Scholar
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep generative models in de novo drug molecule generation. J Chem Inf Model. 2023;64(7):2174–94.
Article PubMed Google Scholar
Wang Y, Zhai Y, Ding Y, Zou Q. SBSM-Pro: support bio-sequence machine for proteins. Sci China Inform Sci. 2024;67(11):212106.
Article Google Scholar
Zhao BW, Su XR, Hu PW, Ma YP, Zhou X, Hu L. A geometric deep learning framework for drug repositioning over heterogeneous information networks. Brief Bioinform. 2022;23(6):bbac384.
Article PubMed Google Scholar
Zhao B-W, Wang L, Hu P-W, Wong L, Su X-R, Wang B-Q, et al. Fusing higher and lower-order biological information for drug repositioning via graph representation learning. IEEE Trans Emerg Top Comput. 2023;12(1):163–76.
Article Google Scholar
Yang X, Niu Z, Liu Y, Song B, Lu W, Zeng L, et al. Modality-DTA: multimodality fusion strategy for drug–target affinity prediction. IEEE/ACM Trans Comput Biol Bioinform. 2022;20(2):1200–10.
Article Google Scholar
Zhang P, Che C, Jin B, Yuan J, Li R, Zhu Y. NCH-DDA: Neighborhood contrastive learning heterogeneous network for drug–disease association prediction. Expert Syst Appl. 2024;238:121855.
Article Google Scholar
Meng Y, Wang Y, Xu J, Lu C, Tang X, Peng T, et al. Drug repositioning based on weighted local information augmented graph neural network. Brief Bioinform. 2024;25(1):bbad431.
Article Google Scholar
Yang K, Yang Y, Fan S, Xia J, Zheng Q, Dong X, et al. DRONet: effectiveness-driven drug repositioning framework using network embedding and ranking learning. Brief Bioinform. 2023;24(1):bbac518.
Article PubMed Google Scholar
Gao Z, Ma H, Zhang X, Wang Y, Wu Z. Similarity measures-based graph co-contrastive learning for drug–disease association prediction. Bioinformatics. 2023;39(6):btad357.
Article CAS PubMed PubMed Central Google Scholar
Yang R, Fu Y, Zhang Q, Zhang L. GCNGAT: Drug-disease association prediction based on graph convolution neural network and graph attention network. Artif Intell Med. 2024:102805.
Bang D, Lim S, Lee S, Kim S. Biomedical knowledge graph learning for drug repurposing by extending guilt-by-association to multiple layers. Nat Commun. 2023;14(1):3570.
Article CAS PubMed PubMed Central Google Scholar
Yu L, Xia M, An Q. A network embedding framework based on integrating multiplex network for drug combination prediction. Brief Bioinform. 2022;23(1):bbab364.
Article PubMed Google Scholar
Himmelstein DS, Lizee A, Hessler C, Brueggeman L, Chen SL, Hadley D, et al. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife. 2017;6:e26726.
Article PubMed PubMed Central Google Scholar
Sun X, Wang B, Zhang J, Li M. Partner-specific drug repositioning approach based on graph convolutional network. IEEE J Biomed Health Inform. 2022;26(11):5757–65.
Article PubMed Google Scholar
Yu Z, Huang F, Zhao X, Xiao W, Zhang W. Predicting drug–disease associations through layer attention graph convolutional network. Brief Bioinform. 2021;22(4):bbaa243.
Article PubMed Google Scholar
Wang Z, Zhou M, Arnold C. Toward heterogeneous information fusion: bipartite graph convolutional networks for in silico drug repurposing. Bioinformatics. 2020;36(Supplement_1):i525–33.
Article CAS PubMed PubMed Central Google Scholar
Yan K, Lv H, Guo Y, Peng W, Liu B. sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics. 2023;39(1):btac715.
Article CAS PubMed Google Scholar
Chen Y, Wang J, Wang C, Zou Q. AutoEdge-CCP: a novel approach for predicting cancer-associated CircRNAs and drugs based on automated edge embedding. PLoS Comput Biol. 2024;20(1):e1011851.
Article CAS PubMed PubMed Central Google Scholar
Wu Q, Zhao W, Li Z, Wipf DP, Yan J. Nodeformer: a scalable graph structure learning transformer for node classification. Adv Neural Inf Process Syst. 2022;35:27387–401.
Google Scholar
Sun Z, Deng Z-H, Nie J-Y, Tang J. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:190210197. 2019.
Yang J, Li Z, Wu WKK, Yu S, Xu Z, Chu Q, et al. Deep learning identifies explainable reasoning paths of mechanism of action for drug repurposing from multilayer biological network. Brief Bioinform. 2022;23(6):bbac469.
Article PubMed Google Scholar
Su X, Hu L, You Z, Hu P, Zhao B. Attention-based knowledge graph representation learning for predicting drug-drug interactions. Brief Bioinform. 2022;23(3):bbac140.
Article PubMed Google Scholar
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. Xgboost: extreme gradient boosting. R package version 04-2. 2015;1(4):1–4.
Google Scholar
Zhu H, Hao H, Yu L. Identifying disease-related microbes based on multi-scale variational graph autoencoder embedding Wasserstein distance. BMC Biol. 2023;21(1):294.
Article PubMed PubMed Central Google Scholar
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.
Article CAS PubMed PubMed Central Google Scholar
Zhang S, Tay Y, Yao L, Liu Q. Quaternion knowledge graph embeddings. Adv Neural Inf Process Syst. 2019;32.
Pan L, Shi C, Dokmanić I. Neural link prediction with walk pooling. arXiv preprint arXiv:211004375. 2021.
Zhang Z, Tang J, Guo F. Complex detection in PPI network using genes expression information. Curr Proteomics. 2018;15(2):119–27.
Article CAS Google Scholar
Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G, editors. Complex embeddings for simple link prediction. PMLR. 2016;2071–2080.
Thafar MA, Olayan RS, Albaradei S, Bajic VB, Gojobori T, Essack M, et al. DTi2Vec: drug–target interaction prediction using network embedding and ensemble learning. J Cheminform. 2021;13:1–18.
Article Google Scholar
Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(11):339–51.
Google Scholar
Chen D, Ma F, Liu XH, Cao R, Wu XZ. Anti-tumor effects of ephedrine and Anisodamine on Skbr3 human breast cancer cell line. Afr J Tradit Complement Altern Med. 2016;13(1):25–32.
Article Google Scholar
Sioud F, Amor S, Toumia IB, Lahmar A, Aires V, Chekir-Ghedira L, et al. A new highlight of ephedra alata decne properties as potential adjuvant in combination with cisplatin to induce cell death of 4T1 breast cancer cells in vitro and in vivo. Cells. 2020;9(2):362.
Article CAS PubMed PubMed Central Google Scholar
Mohammed L, Mohammed R. Cytotoxic activity of ephedra alata extracts on human lymphocytes and breast cancer cell line in vitro. Iraqi J Sci. 2023;30:4210–22.
Article Google Scholar
Fuhrman B, Barba M, Schünemann HJ, Hurd T, Quattrin T, Cartagena R, et al. Basal growth hormone concentrations in blood and the risk for prostate cancer: a case-control study. Prostate. 2005;64(2):109–15.
Article CAS PubMed Google Scholar
Stangelberger A, Schally AV, Djavan B. New treatment approaches for prostate cancer based on peptide analogues. Eur Urol. 2008;53(5):890–900.
Article CAS PubMed Google Scholar
Sledge GW Jr. Etoposide in the management of metastatic breast cancer. Cancer. 1991;67(S1):266–70.
Article PubMed Google Scholar
Alpsoy A, Yasa S, Gündüz U. Etoposide resistance in MCF-7 breast cancer cell line is marked by multiple mechanisms. Biomed Pharmacother. 2014;68(3):351–5.
Article CAS PubMed Google Scholar
Atienza DM, Vogel CL, Trock B, Swain SM. Phase II study of oral etoposide for patients with advanced breast cancer. Cancer. 1995;76(12):2485–90.
Article CAS PubMed Google Scholar
Auchus RJ, Yu MK, Nguyen S, Mundle SD. Use of prednisone with abiraterone acetate in metastatic castration-resistant prostate cancer. Oncologist. 2014;19(12):1231–40.
Article CAS PubMed PubMed Central Google Scholar
Fizazi K, Tran N, Fein L, Matsubara N, Rodriguez-Antolin A, Alekseev BY, et al. Abiraterone plus prednisone in metastatic, castration-sensitive prostate cancer. N Engl J Med. 2017;377(4):352–60.
Article CAS PubMed Google Scholar
Sartor O, Weinberger M, Moore A, Li A, Figg WD. Effect of prednisone on prostate-specific antigen in patients with hormone-refractory prostate cancer. Urology. 1998;52(2):252–6.
Article CAS PubMed Google Scholar
Ripoll GV, Garona J, Pifano M, Farina HG, Gomez DE, Alonso DF. Reduction of tumor angiogenesis induced by desmopressin in a breast cancer model. Breast Cancer Res Treat. 2013;142:9–18.
Article CAS PubMed PubMed Central Google Scholar
Garona J, Pifano M, Orlando UD, PAsTRIAN MB, Iannucci NB, Ortega HH, et al. The novel desmopressin analogue [V4Q5] dDAVP inhibits angiogenesis, tumour growth and metastases in vasopressin type 2 receptor-expressing breast cancer models. Int J Oncol. 2015;46(6):2335–45.
Weinberg RS, Grecco MO, Ferro GS, Seigelshifer DJ, Perroni NV, Terrier FJ, et al. A phase II dose-escalation trial of perioperative desmopressin (1-desamino-8-d-arginine vasopressin) in breast cancer patients. Springerplus. 2015;4:1–8.
Article CAS Google Scholar
David-Beabes GL, Overman MJ, Petrofski JA, Campbell PA, de Marzo AM, Nelson WG. Doxorubicin-resistant variants of human prostate cancer cell lines DU 145, PC-3, PPC-1, and TSU-PR1: characterization of biochemical determinants of antineoplastic drug sensitivity. Int J Oncol. 2000;17(6):1077–163.
CAS PubMed Google Scholar
Newling D. The use of adriamycin and its derivatives in the treatment of prostatic cancer. Cancer Chemother Pharmacol. 1992;30:S90–4.
Article PubMed Google Scholar
Dagsuyu E, Yanardag R. Purification and characterization of thioredoxin reductase enzyme from commercial Spirulina platensis tablets by affinity chromatography and investigation of the effects of some chemicals and drugs on enzyme activity. Biotechnol Appl Biochem. 2024;71(1):176–92.
Article CAS PubMed Google Scholar
Ghosh S, Lalani R, Maiti K, Banerjee S, Bhatt H, Bobde YS, et al. Synergistic co-loading of vincristine improved chemotherapeutic potential of pegylated liposomal doxorubicin against triple negative breast cancer and non-small cell lung cancer. Nanomedicine. 2021;31:102320.
Article CAS PubMed Google Scholar
Katsumata K, Tomioka H, Kusama M, Aoki T, Koyanagi Y. Clinical effects of combination therapy with mitoxantrone, vincristine, and prednisolone in breast cancer. Cancer Chemother Pharmacol. 2003;52:86–8.
Article CAS PubMed Google Scholar
Chen J, Li S, Shen Q, He H, Zhang Y. Enhanced cellular uptake of folic acid–conjugated PLGA–PEG nanoparticles loaded with vincristine sulfate in human breast cancer. Drug Dev Ind Pharm. 2011;37(11):1339–46.
Article CAS PubMed Google Scholar
Sasaki H, Klotz LH, Sugar LM, Kiss A, Venkateswaran V. A combination of desmopressin and docetaxel inhibit cell proliferation and invasion mediated by urokinase-type plasminogen activator (uPA) in human prostate cancer cells. Biochem Biophys Res Commun. 2015;464(3):848–54.
Article CAS PubMed Google Scholar
Hoffman A, Sasaki H, Roberto D, Mayer MJ, Klotz LH, Venkateswaran V. Effect of combination therapy of desmopressin and docetaxel on prostate cancer cell du145 proliferation, migration and growth: MP83-17. J Urol. 2017;197(4):e1112–3.
Google Scholar
Bass R, Roberto D, Wang DZ, Cantu FP, Mohamadi RM, Kelley SO, et al. Combining desmopressin and docetaxel for the treatment of castration-resistant prostate cancer in an orthotopic model. Anticancer Res. 2019;39(1):113–8.
Article CAS PubMed Google Scholar
Brady SF, Pawluczyk JM, Lumma PK, Feng D-M, Wai JM, Jones R, et al. Design and synthesis of a pro-drug of vinblastine targeted at treatment of prostate cancer with enhanced efficacy and reduced systemic toxicity. J Med Chem. 2002;45(21):4706–15.
Article CAS PubMed Google Scholar
Collins D, Gaynor N, Conlon N, Gullo G, Eustace A, Crown J. Abstract P4–07–08: Budesonide and loperamide do not impact the cytotoxicity of neratinib or HER2-directed monoclonal antibodies in HER2+ breast cancer cell lines. Cancer Res. 2019;79(4_Supplement):P4–07–8-P4--8.
Lundgren S, Gundersen S, Klepp R, Lønning P, Lund E, Kvinnsland S. Megestrol acetate versus aminoglutethimide for metastatic breast cancer. Breast Cancer Res Treat. 1989;14:201–6.
Article CAS PubMed Google Scholar
Kamradt JM, Pienta KJ. Etoposide in prostate cancer. Expert Opin Pharmacother. 2000;1(2):271–5.
Article CAS PubMed Google Scholar
Pienta KJ, Lehr JE. Inhibition of prostate cancer growth by estramustineand etoposide: evidence for interaction at the nuclear matrix. J Urology. 1993;149(6):1622–5.
Article CAS Google Scholar
Cattrini C, Capaia M, Boccardo F, Barboro P. Etoposide and topoisomerase II inhibition for aggressive prostate cancer: data from a translational study. Cancer Treat Res Commun. 2020;25:100221.
Article PubMed Google Scholar
Carie A, Sebti S. A chemical biology approach identifies a beta-2 adrenergic receptor agonist that causes human tumor regression by blocking the Raf-1/Mek-1/Erk1/2 pathway. Oncogene. 2007;26(26):3777–88.
Article CAS PubMed Google Scholar
Obasaju C, Hudes GR. Paclitaxel and docetaxel in prostate cancer. Hematology/Oncol Clin. 2001;15(3):525–45.
Article CAS Google Scholar
Hua M-Y, Yang H-W, Chuang C-K, Tsai R-Y, Chen W-J, Chuang K-L, et al. Magnetic-nanoparticle-modified paclitaxel for targeted therapy for prostate cancer. Biomaterials. 2010;31(28):7355–63.
Article CAS PubMed Google Scholar
Kelly WK, Curley T, Slovin S, Heller G, McCaffrey J, Bajorin D, et al. Paclitaxel, estramustine phosphate, and carboplatin in patients with advanced prostate cancer. J Clin Oncol. 2001;19(1):44–53.
Article CAS PubMed Google Scholar
Bonnefoi H, Grellety T, Tredan O, Saghatchian M, Dalenc F, Mailliez A, et al. A phase II trial of abiraterone acetate plus prednisone in patients with triple-negative androgen receptor positive locally advanced or metastatic breast cancer (UCBG 12–1). Ann Oncol. 2016;27(5):812–8.
Article CAS PubMed Google Scholar
Marini G, Murray S, Goldhirsch A, Gelber R, Castiglione-Gertsch M, Price K, et al. The effect of adjuvant prednisone combined with CMF on patterns of relapse and occurrence of second malignancies in patients with breast cancer. Ann Oncol. 1996;7(3):245–50.
Article CAS PubMed Google Scholar
Wong NS, Buckman RA, Clemons M, Verma S, Dent S, Trudeau ME, et al. Phase I/II trial of metronomic chemotherapy with daily dalteparin and cyclophosphamide, twice-weekly methotrexate, and daily prednisone as therapy for metastatic breast cancer using vascular endothelial growth factor and soluble vascular endothelial growth factor receptor levels as markers of response. J Clin Oncol. 2010;28(5):723–30.
Article CAS PubMed Google Scholar
Atkins JN, Muss HB, Case LD, Frederick Richards I, Grote T, McFarland J. Leucovorin and high-dose fluorouracil in metastatic prostate cancer: a phase II trial of the Piedmont oncology association. Am J Clin Oncol. 1996;19(1):23–5.
Article CAS PubMed Google Scholar
Dewys WD, Begg CB, Brodovsky H, Creech R, Khandekar J. A comparative clinical trial of adriamycin and 5-fluorouracil in advanced prostatic cancer: prognostic factors and response. Prostate. 1983;4(1):1–11.
Article CAS PubMed Google Scholar
Swanson GP, Faulkner J, Smalley SR, Noble MJ, Stephens RL, O’Rourke TJ, et al. Locally advanced prostate cancer treated with concomitant radiation and 5-fluorouracil: Southwest oncology group study 9024. J UROLOGY. 2006;176(2):548–53.
Article CAS Google Scholar
Liu XW, Shi TY, Gao D, Ma CY, Lin H, Yan D, et al. iPADD: a computational tool for predicting potential antidiabetic drugs using machine learning algorithms. J Chem Inf Model. 2023;63(15):4960–9.
Article CAS PubMed Google Scholar
Yang Y, Gao D, Xie X, Qin J, Li J, Lin H, et al. DeepIDC: a prediction framework of injectable drug combination based on heterogeneous information and deep learning. Clin Pharmacokinet. 2022;61(12):1749–59.
Article CAS PubMed Google Scholar
Ross J, Belgodere B, Chenthamarakshan V, Padhi I, Mroueh Y, Das P. Large-scale chemical language representations capture molecular structure and properties. Nat Mach Intell. 2022;4(12):1256–64.
Article Google Scholar
Ahmed E, Heinzinger M, Dallago C, Rihawi G, Wang Y, Jones L, et al. Prottrans: towards cracking the language of life’s code through self-supervised learning. IEEE Trans Pattern Anal Mach Intell. 2021;44:7112–27.
Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018.
Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc. 2021;3(1):1–23.
Article CAS Google Scholar
Choromanski K, Likhosherstov V, Dohan D, Song X, Gane A, Sarlos T, et al. Rethinking attention with performers. arXiv preprint arXiv:200914794. 2020.
Jiang X, Lu Y, Fang Y, Shi C, editors. Contrastive pre-training of GNNs on heterogeneous graphs. CIKM. 2021;803–812.
Ma CY, Luo YM, Zhang TY, Hao YD, Xie XQ, Liu XW, et al. Predicting coronary heart disease in Chinese diabetics using machine learning. Comput Biol Med. 2024;169:107952.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We are very much indebted to the anonymous reviewers, whose constructive comments are very helpful for this paper.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 62422113, 62271329, 62402166); Shenzhen Science and Technology Program (20231129091450002); Key Field of Department of Education of Guangdong Province (2022ZDZX2082); Natural Science Foundation of Hunan Province (2024JJ6158).

Author information

Yifan Shang and Zixu Wang contributed equally to this work.

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
Yifan Shang, Yangyang Chen, Xinyu Yang, Zhonghao Ren & Xiangxiang Zeng
Department of Computer Science, University of Tsukuba, Tsukuba, 305-8577, Japan
Zixu Wang
School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China
Lei Xu

Authors

Yifan Shang
View author publications
You can also search for this author inPubMed Google Scholar
Zixu Wang
View author publications
You can also search for this author inPubMed Google Scholar
Yangyang Chen
View author publications
You can also search for this author inPubMed Google Scholar
Xinyu Yang
View author publications
You can also search for this author inPubMed Google Scholar
Zhonghao Ren
View author publications
You can also search for this author inPubMed Google Scholar
Xiangxiang Zeng
View author publications
You can also search for this author inPubMed Google Scholar
Lei Xu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yifan Shang and Zixu Wang design experiments as well as writing manuscripts. Yangyang Chen, Xinyu Yang and Zhonghao Ren draw fgures and analyse the results. Xiangxiang Zeng, Lei Xu revise the manuscript. Lei Xu and Yifan Shang provide fnancial help with experiments as well as revising papers. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lei Xu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

12915_2025_2206_MOESM1_ESM.docx

Additional file 1. Biological large language model details, algorithm S1, supplementary details, Tables S1, and Figures S1. Biological large language model details: Details of the biological large language model used to extract initial features of biological entities. Algorithm S1: Algorithm training procedure and complexity analysis. Supplementary details: Compared Methods Detail. Fig. S1: 1st-order vs 2nd-order prediction performance. Table S1: Parameter Settings of XBGoost in HetioNet and KEGG datasets, respectively.

12915_2025_2206_MOESM2_ESM.xlsx

Additional file 2. Figures experiments data. This file contains the specific data of Figs. 2, 3 and 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shang, Y., Wang, Z., Chen, Y. et al. HNF-DDA: subgraph contrastive-driven transformer-style heterogeneous network embedding for drug–disease association prediction. BMC Biol 23, 101 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12915-025-02206-x

Download citation

Received: 15 November 2024
Accepted: 03 April 2025
Published: 16 April 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12915-025-02206-x

HNF-DDA: subgraph contrastive-driven transformer-style heterogeneous network embedding for drug–disease association prediction

Abstract

Background

Results

Conclusions

Background

Results and discussions

Datasets

Baselines

Experimental setting and evaluation metrics

Performance comparison

Predictive potential for unknown drug/disease classes

Ablation experiments

Visualization of embeddings

Case study

Discussion

Conclusions

Methods

Biomedical heterogeneous network

Computing initial embeddings

Heterogeneous network embedding

All-pair message passing encoder

Subgraph structure capture

Learning objectives

Drug–disease association prediction

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

12915_2025_2206_MOESM1_ESM.docx

12915_2025_2206_MOESM2_ESM.xlsx

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Biology

Contact us