Fig. 4

Comparison of different methods across four comprehensive metrics on 100 partitioned independent test sets. Wilcoxon rank-sum test is used to calculate the statistical difference between two groups of results. Comparisons with p values < 0.05 are marked with *, p values < 0.01 with **, p values < 0.001 with ***, and “ns” indicates no significant difference