About BChemRF-CPPred
BChemRF-CPPred (Beyond Chemical Rules-based Framework for CPP Prediction) is a machine-learning-based framework to predict cell-penetrating peptides (CPPs). This tool is based on twelve structure-based (physico-chemical) and sixty-four sequence-based descriptors.
The two supported input file formats are:
- FASTA: input file containing only sequences of natural peptides; and
- PDB: input file containing information of natural and non-natural peptides.
The twelve structured-based properties are:
- Molecular weight (MW);
- Topological polar surface area (tPSA);
- 1-octanol/water partition coefficient (cLogP);
- Fraction of sp3-hybridized carbon atoms (Fsp3);
- Number of aromatic rings (NAR);
- Number of rotatable bonds (NRB);
- Hydrogen bond acceptors (HBA);
- Hydrogen bond donors (HBD);
- Number of primary amino groups (NPA);
- Number of guanidinium groups (NG);
- Net charge (NetC); and
- Number of negatively charged amino acids (NNCAA) at pH = 7.4.
The sixty-four sequence-based descriptors are divided into three sub-groups:
- Amino acid composition (AAC), with two descriptors;
- Pseudo amino acid composition (PseAAC), with twenty-two descriptors; and
- Dipeptide composition (DPC), with forty descriptors.
The two AAC properties are:
- Fraction of arginine (f[Arg]); and
- Fraction of lysine (f[Lys]).
The twenty-two PseAAC properties are:
- PAAC1;
- PAAC2;
- PAAC3;
- PAAC4;
- PAAC5;
- PAAC6;
- PAAC7;
- PAAC8;
- PAAC9;
- PAAC10;
- PAAC11;
- PAAC12;
- PAAC13;
- PAAC14;
- PAAC15;
- PAAC16;
- PAAC17;
- PAAC18;
- PAAC19;
- PAAC20;
- PAAC21; and
- PAAC22.
The forty DPC properties are:
- RR;
- KK;
- KR;
- RQ;
- RK;
- WR;
- WK;
- NR;
- KW;
- WF;
- RS;
- FQ;
- RW;
- RI;
- QR;
- GR;
- RM;
- IW;
- RL;
- QN;
- ET;
- CN;
- PG;
- PL;
- GI;
- TV;
- FC;
- FG;
- GP;
- LS;
- SE;
- CV;
- GT;
- FL;
- CC;
- VC;
- GA;
- LG;
- GF; and
- GL.
BChemRF-CPPred is composed by an artificial neural network (ANN), support vector machine (SVM), and Gaussian process classifier (GPC).
The peptide descriptors are the input of BChemRF-CPPred, and they are based in feature compositions (FCs). Bellow are the distributions of the aforementioned structure- and sequence-based descriptors in the four offered FCs.
Property groups | Structure-based | Sequence-based | ||
---|---|---|---|---|
AAC | PseAAC | DPC | ||
Feature compositions | ||||
FC-1 | — | 2 properties | 22 properties | 40 properties |
FC-2 | 12 properties | — | — | — |
FC-3 | 12 properties | 2 properties | 22 properties | 40 properties |
FC-4 | 9 properties1 | 2 properties | 22 properties | 10 properties2 |
1 MW, cLogP, Fsp3, NAR, HBA, NPA, NG, NetC, and NNCAA. | ||||
2 RR, KK, KR, RQ, RK, GL, GF, LG, GA, and VC. |