Features Download

Features type:

If you want to generate the features from the very begining, here are some hints:

Gene Features :

by preparing all gene feature into a folder and then run the folder_pca.py, you can generate the KPCA feature. Here is a demo,as the k-mer we generated is from 3-8, and the size is huge, you can use the code and gene_max_sequence16127.csv file to generate them. And them put them into the gene folder to conduct KPCA.

For the iLearn features[1], it is a platform that can generate sequence-based features. We need to input the fasta and then generate corresponding feature from the platform, then apply KPCA to those feature to generate the features we leveraged. (btw,iLearn features is only one part of our methods, we still need k-mer data from 3mer-8mer for mix-kpca)

[1] Chen Z, Zhao P, Li F, et al. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data[J]. Briefings in bioinformatics, 2020, 21(3): 1047-1057.

Gene KPCA : Examples

LncRNA fasta , ilearn KPCA: Examples

The Feautre below are the after-transformation features, which is ready to use (to build the constructed embedding dataset for machine learning and deep leanring models.)

LncRNA: LncRNA Features

Gene: Gene Features

If the download speed is low,or you just want to conduct downstream experiments, you can just try to download the embedding we leveraged for downstream tasks:

lncRNA mix kpca 4096

Gene mix kpca 4096

Using the feature provided, we can construct the construct embedding dataset, those datasets are input for the machine learning methods.

First, you need to generate constructed embedding dataset from original dataset:

Original Dataset which is provided at: here

Constructed dataset: here

Constructed embedding dataset generation samples<(Please unzip

lncRNA mix kpca 4096, Gene mix kpca 4096 into the same root folder to run this demo)