Comparison with SOTA methods (Features)

The LPI-deepGBDT [1] was designed for lncRNA-protein relation prediction. The input is the RNA/protein sequence, while our input contains no protein sequence. However, we checked their GitHub repo and found that their methods provide a feature extractor. Thus, we leveraged their feature extractors for sequence and applied their features in our framework/ and GAE-LGA [2] for more evaluation.

GAE-LGA [2] , a recently published paper (27 October 2022). In their manuscript, multi-omics features for lncRNAs and PCGs were collected. Then, they fed those features into a framework that first calculated similarities between those nodes and then fed into a graph-autoencoder for potential lncRNA-PCG relations prediction.

For the comparison, as LPI-DeepGBDT provided a feature extractor function, thus we leveraged their feature extractors in our frameworks. For GAE-LGA, we replaced their features(Since GAE-LGA were using the PCG whose number is less than ours, we first had to filter the overlap between their method and ours.) with ours and LPI’s generated features and conducted their graph-autoencoder-based framework for performance evaluation.

MethodsYearJCodesPrediction typeComparisonHow (Compare)
LPI‑deepGBDT [1]2021BMC bioinformaticshttps://github.plhhnu/LPI-deepGBDTProteinYesTheir Feature/Our Framework
GAE-LGA [2]2022.10BIBhttps://github.com/ meihonggao/GAE-LGAPCG-Protein coding geneYesOur Feature/LPI [1] feature/ Their Framework
The details of the Latest Methods

Part I Using different features in our framework

Here we leveraged the features from LPI generating features from the same group of sequences. We applied those features to the downstream framework and calculated AUC/AUPR/F1/MCC for two feature generators. Details can be found in Fig 1. As can be seen in the picture, lncRNA-top-generated features can outperform LPI feature generators in all metrics with p-value <=0.05, indicating the features are better than the SOTA methods generators.

The Feature Comparison of lncRNA-Top and LPI. The negative sampling rate was set to 1. We took the five-fold cross-validation and repeated it three times. For negative sampling random seeds, we also repeated three times. Then we calculated the mean value of AUC/AUPR/F1/MCC for downstream RF/ CNN/RF-CNN models. The dataset leveraged for testing is LCIT.

Part II Using Features (LPI’s, ours, and multi-omics features) in the framework of GAE-LGA

  Results of GAE-LGA with different features as input

MethodD1_AUCD1_AUPRD1_F1-scoreD1_MCCD2_AUCD2_AUPRD2_F1-scoreD2_MCCD3_AUCD3_AUPRD3_F1-scoreD3_MCC
GAE-LGA_
(Adjust_original features)
0.95220.54790.76910.60710.9230.59840.82650.69070.81490.41690.77610.6007
GAE-LGA_(LPI_Features)0.94820.54070.76920.60720.9180.58330.79010.6330.84040.53120.81720.6524
GAE-LGA_(Our_Features)0.94890.54530.77260.61240.92420.61760.8340.70190.85680.56830.83590.6809
Increment (%)-0.346-0.4760.45510.87300.13003.20850.90741.62155.14236.3157.705113.351

We conducted the experiments again in the adjusted network (that removed all non-overlapped lncRNA/gene and corresponding rows and columns) to acquire a benchmark value of GAE-LGA, named as GAE-LGA (Adjust_original features). Then we replaced the original mult-omics features with  LPI  [2] feaures and our feaures (lncRNA-Top features). The results shown that our features can outperformed most of the metrics (10 out of 12 metrics). Notably, as more lncRNA is involved in the dataset( from 117 to 155 to 193), the performance increment increases correspondingly.

Dataset Details After filtered

Datasetlncoverlap_lncPCG (Protein coding gene)overlap_gene
Dataset1208117256211
Dataset2238155716617
Dataset3263193498425

Ref:

[1] Zhou L, Wang Z, Tian X, et al. LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification[J]. BMC bioinformatics, 2021, 22(1): 1-24.

[2] Gao M, Liu S, Qi Y, et al. GAE-LGA: integration of multi-omics data with graph autoencoders to identify lncRNA–PCG associations[J]. Briefings in Bioinformatics, 2022, 23(6): bbac452.