Fig. 1
From: PBertKla: a protein large language model for predicting human lysine lactylation sites

The determination of optimal sample length and sequence similarity threshold for defining human Kla benchmark datasets. A The line chart showing each AUC value of each dataset with specific sample length and sequence similarity, in which x axis is sample length, different colored lines correspond to different sequence similarity thresholds. B–C Two violin plots visualizing the AUC values of datasets generated based on different sequence similarity thresholds of CD-HIT (corresponding to different colors) and different sample lengths (corresponding to different colors), respectively. D Sequence characteristics of Kla and non-Kla sites in the training data