About BChemRF-CPPred

BChemRF-CPPred (Beyond Chemical Rules-based Framework for CPP Prediction) is a machine-learning-based framework to predict cell-penetrating peptides (CPPs). This tool is based on twelve structure-based (physico-chemical) and sixty-four sequence-based descriptors.

The two supported input file formats are:

  • FASTA: input file containing only sequences of natural peptides; and
  • PDB: input file containing information of natural and non-natural peptides.

The twelve structured-based properties are:

  1. Molecular weight (MW);
  2. Topological polar surface area (tPSA);
  3. 1-octanol/water partition coefficient (cLogP);
  4. Fraction of sp3-hybridized carbon atoms (Fsp3);
  5. Number of aromatic rings (NAR);
  6. Number of rotatable bonds (NRB);
  7. Hydrogen bond acceptors (HBA);
  8. Hydrogen bond donors (HBD);
  9. Number of primary amino groups (NPA);
  10. Number of guanidinium groups (NG);
  11. Net charge (NetC); and
  12. Number of negatively charged amino acids (NNCAA) at pH = 7.4.

The sixty-four sequence-based descriptors are divided into three sub-groups:

  1. Amino acid composition (AAC), with two descriptors;
  2. Pseudo amino acid composition (PseAAC), with twenty-two descriptors; and
  3. Dipeptide composition (DPC), with forty descriptors.

The two AAC properties are:

  1. Fraction of arginine (f[Arg]); and
  2. Fraction of lysine (f[Lys]).

The twenty-two PseAAC properties are:

  1. PAAC1;
  2. PAAC2;
  3. PAAC3;
  4. PAAC4;
  5. PAAC5;
  6. PAAC6;
  7. PAAC7;
  8. PAAC8;
  9. PAAC9;
  10. PAAC10;
  11. PAAC11;
  12. PAAC12;
  13. PAAC13;
  14. PAAC14;
  15. PAAC15;
  16. PAAC16;
  17. PAAC17;
  18. PAAC18;
  19. PAAC19;
  20. PAAC20;
  21. PAAC21; and
  22. PAAC22.

The forty DPC properties are:

  1. RR;
  2. KK;
  3. KR;
  4. RQ;
  5. RK;
  6. WR;
  7. WK;
  8. NR;
  9. KW;
  10. WF;
  11. RS;
  12. FQ;
  13. RW;
  14. RI;
  15. QR;
  16. GR;
  17. RM;
  18. IW;
  19. RL;
  20. QN;
  21. ET;
  22. CN;
  23. PG;
  24. PL;
  25. GI;
  26. TV;
  27. FC;
  28. FG;
  29. GP;
  30. LS;
  31. SE;
  32. CV;
  33. GT;
  34. FL;
  35. CC;
  36. VC;
  37. GA;
  38. LG;
  39. GF; and
  40. GL.

BChemRF-CPPred is composed by an artificial neural network (ANN), support vector machine (SVM), and Gaussian process classifier (GPC).

Framework scheme

The peptide descriptors are the input of BChemRF-CPPred, and they are based in feature compositions (FCs). Bellow are the distributions of the aforementioned structure- and sequence-based descriptors in the four offered FCs.

Property groups Structure-based Sequence-based
AAC PseAAC DPC
Feature compositions
FC-1 2 properties 22 properties 40 properties
FC-2 12 properties
FC-3 12 properties 2 properties 22 properties 40 properties
FC-4 9 properties1 2 properties 22 properties 10 properties2
1 MW, cLogP, Fsp3, NAR, HBA, NPA, NG, NetC, and NNCAA.
2 RR, KK, KR, RQ, RK, GL, GF, LG, GA, and VC.