Section 1: An Overview of DrugGen ——— ligands


Section 2: Chemical Statistics


The Vina result for GraphBP has been excluded from the analysis due to its poor performance.

Section 3: Bond Statistics


Note: Within each figure title, atomic characters in lowercase denote aromatic atoms, while those in uppercase indicate non-aromatic atoms.

Bond Angle Distribution

Dihedral Angle Distribution

Section 4: RMSD of Fragments


We employ the MMFF94s force field implemented in RDKit to optimize molecular fragments of varying sizes and subsequently compute their RMSD values.

Section 5: Detailed Information about the Target Proteins


Target Introduction
Current protein sets We have chosen a total of 100 protein targets from the cross-docking dataset to serve as our test protein set. It is worth noting that these targets are actually the same as those used in the majority of articles that are currently included in our database as a test set.
PDBBind 2020 We are pleased to inform you that we are currently in the process of integrating all of the proteins from PDBbind, a comprehensive database containing experimentally measured binding affinity data for all biomolecular complexes deposited in the Protein Data Bank (PDB), into our own database. This will significantly enhance the value and utility of our resources, and we are confident that it will be of great benefit to our users.
3CL The protease 3CLpro is a potential drug target for coronavirus infections due to its essential role in processing the polyproteins that are translated from the viral RNA. The corresponding PDB IDs are: 7lkx, 7mbi, 7r7h, 7ujg, 8acd, 7bz5, 7k9i, 7pr0, 7vnb, 8dma.
Aspirin Aspirin, also known as acetylsalicylic acid, is a salicylic acid drug, usually used as an analgesic, antipyretic and anti-inflammatory drug. Also it can be used to treat certain inflammatory diseases, such as Kawasaki's disease, pericarditis, and rheumatic fever. The corresponding PDB IDs are: 1tgm, 4nsb, 6mqf.
Ibuprofen Ibuprofen is a nonsteroidal anti-inflammatory drug (NSAID) that is used for treating pain, fever, and inflammation. The corresponding PDB IDs are: 3p6h, 4jtr.
Naproxen Naproxen, sold under the brand name Aleve among others, is a nonsteroidal anti-inflammatory drug (NSAID) used to treat pain, menstrual cramps, inflammatory diseases such as rheumatoid arthritis, gout and fever. The corresponding PDB ID is: 3r58.
Paracetamol Paracetamol (acetaminophen[a] or para-hydroxyacetanilide) is a medication used to treat fever and mild to moderate pain. Common brand names include Tylenol and Panadol. The corresponding PDB IDs are: 2dpz, 4a9j, 4yji.
Penicillin Penicillins (P, PCN or PEN) are a group of β-lactam antibiotics originally obtained from Penicillium moulds, principally P. chrysogenum and P. rubens. The corresponding PDB ID is: 1fxv.

Section 6: Download Links for All Trained Models


Method Github Trained models
AlphaDrug codes trained model
SBDD codes trained model
Pocket2Mol codes trained model
GraphBP codes trained model
TargetDiff codes trained model
DiffSBDD codes trained model

Section 7: Guidelines for Model Selection


Based on the performance of the six models for Vina score, QED, SA, the proportion of different ring sizes, and the distribution of various bond angles, dihedral angles and RMSD values, we recommend you to primarily utilize molecules generated by AlphaDrug, Pocket2Mol, and TargetDiff. These three methods respectively represent the current three types of molecular generation: sequence-based autoregressive models, 3D graph-based autoregressive models, and diffusion models.

Among the three models, AlphaDrug typically generates molecules with a greater molecular weight and enhanced lipophilicity. This characteristic enhances their potential to permeate the skin and traverse the blood-brain barrier. Pocket2Mol often produces molecules of reduced molecular weight, making them adaptive for binding with smaller protein pockets. In terms of synthesis feasibility, molecules derived from AlphaDrug and Pocket2Mol are generally more straightforward to produce compared to those from TargetDiff. However, TargetDiff stands out by yielding molecules that span a broader range in QED, LogP, and Weight metrics, showcasing its ability to generate a richer diversity of molecules. Running the AlphaDrug model demands more CPU resources, while Pocket2Mol and TargetDiff lean more towards GPU requirements. If you want to run these models yourself, you can choose the model based on the resources you have on hand.