Authors:
P. Therese Lang (UCSF)
Demetri Moustakas (UCSF/Harvard)
Scott Brozell (The Scripps Research Institute)
Noel Carrascal (SUNY-Stony Brook)
Sudipto Mukherjee (SUNY-Stony Brook)
Scott Pegg (UCSF)
Kaushik Raha (UCSF)
Devleena Shivakumar (The Scripps Research Institute)
Robert Rizzo (SUNY-Stony Brook)
David Case (The Scripps Research Institute)
Brian Shoichet (UCSF)
Irwin Kuntz (UCSF)
Copyright © 2006-2008
Regents of the University of California
All Rights Reserved
Last updated April 22, 2008
2.7.2. Identification of Rigid Segments
2.7.3. Manual Specification of Non-rotatable Bonds
2.7.4. Identification of Flexible Layers
2.7.5. Pruning the Conformation Search Tree
DOCK addresses the problem of "docking" molecules to each other. In general, "docking" is the identification of the low-energy binding modes of a small molecule, or ligand, within the active site of a macromolecule, or receptor, whose structure is known. A compound that interacts strongly with, or binds, a receptor associated with a disease may inhibit its function and thus act as a drug. Solving the docking problem computationally requires an accurate representation of the molecular energetics as well as an efficient algorithm to search the potential binding modes.
Historically, the DOCK algorithm addressed rigid body docking using a geometric matching algorithm to superimpose the ligand onto a negative image of the binding pocket. Important features that improved the algorithm's ability to find the lowest-energy binding mode, including force-field based scoring, on-the-fly optimization, an improved matching algorithm for rigid body docking and an algorithm for flexible ligand docking, have been added over the years. For more information on past versions of DOCK, click here.
With the release of DOCK 6, we continue to improve the algorithm's ability to predict binding poses by adding new features like force-field scoring enhanced by solvation and receptor flexibility. For more information about the current release of DOCK, click here.
We and others have used DOCK for the following applications:
- predict binding modes of small molecule-protein complexes
- search databases of ligands for compounds that inhibit enzyme activity
- search databases of ligands for compounds that bind a particular protein
- search databases of ligands for compounds that bind nucleic acid targets
- examine possible binding orientations of protein-protein and protein-DNA complexes
- help guide synthetic efforts by examining small molecules that are computationally derivatized
- many more...
DOCK is Unix based scientific software and follows a common installation recipe: download, unpack, configure, build, and test. The simple configuration scheme of DOCK is based on plain text files. Building and testing employ the make command. DOCK installation is so simple and transparent that users have a reasonable chance of correcting problems themselves.
We recommend starting with a plain serial installation; the appropriate configuration options are either gnu or sgi. Follow the detailed steps (1. through 5.) enumerated below. Subsequently, additional executables can be installed for parallel, pbsa, etc; see step 6. (Here is the quick start for an example gnu parallel installation: cd install; ./configure gnu; make install; make dockclean; ./configure gnu parallel; setenv MPICH_HOME /bla; make dock; make test;).
If problems occur then read the diagnostics carefully and apply the scientific method. To observe what's under the hood, view the configuration file (install/config.h) that is created by configure, and execute make -n for a dryrun. Consult the FAQ. Search the DOCK-Fans mailing list archive.
NOTE FOR WINDOWS USERS: DOCK and its accessories must be run using a Unix-like environment such as Cygwin ( http://www.cygwin.com/ ). We recommend a full Unix installation. In particular, when you install your emulator, make sure to also install compilers, Unix shells, and perl ( Devel for Cygwin ). All steps below should be performed using Cygwin or another Unix emulator for Windows.
(1) Unpack the distribution using the following command:
[user@dock ~] tar -zxvf dock.6.2.tar.gz
(2) Enter the installation directory:
[user@dock ~] cd dock6/install
(3) Configure the Makefile for the appropriate operating system:
[user@dock ~] ./configure [configuration file]
AUTHOR: Scott Brozell
USAGE: configure [-help] [configuration file]
OPTIONS
-help #emit the usage statement
configuration file #input file containing operating system appropriate variables
Configuration Files Target gnu gnu-style compiler gnu.parallel gnu-style compiler with parallel processing library capability gnu.pbsa gnu-style compiler with PB/SA (ZAP library) capability gnu.parallel.pbsa gnu-style compiler with parallel processing library and PB/SA (ZAP library) capabilities sgi sgi-style compiler sgi.parallel sgi-style compiler with parallel processing library capability DESCRIPTION:
Create the DOCK configuration file, config.h, by copying an existing configuration file that is selected using the arguments. When invoked without arguments, print this usage statement and if the configuration file exists then print its creation stamp. Some configuration files require that environment variables be defined; these requirements are listed in the files and emitted by configure.(4) Build the desired DOCK executable(s) via one of the following commands:
[user@dock ~] make all # builds all the DOCK programs
[user@dock ~] make dock # builds only the dock program
[user@dock ~] make utils # builds only the accessory programs(5) Test the built executable(s) via these commands:
[user@dock ~] cd test; make test
The test directory contains the DOCK quality control (QC) suite. It produces pass/fail results via fast regression tests. The suite should complete in less than ten minutes; un-passed tests should be examined to determine their significance. The make check command executed from the test directory emits all the differences uncovered during testing; this command is automatically executed by the main make test command above.
NOTE: Some failures are not significant. For example, differences in the tails of floating point numbers may not be significant. The sources of such differences are frequently platform dependencies from computer hardware, operating systems, and compilers that impact arithmetic precision and random number generators. We are working on increasing DOCK's resilience to these issues. For now, apply common sense and good judgment to determine the significance of a possible failure.
Some features of DOCK (DOCK3.5 Score aka ChemGrid Score) require an electrostatic potential map which is usually generated by DelPhi. Testing of these features requires that the environment variable DELPHI_PATH be defined to the full path of DelPhi. DelPhi is not distributed with DOCK.
(6) OPTIONAL: Alternate executables.
(i) DOCK with parallel processing functionality requires a Message Passing Interface (MPI) library. Because of the vagaries of MPI libraries, building parallel DOCK has more pitfalls than installing the serial version. The MPI library must be installed and running on the system if the parallel features of DOCK are to be used.
Currently, the DOCK installation mechanism only directly supports the MPICH2 and MPICH implementations. The MPICH2 library is freely available from Argonne National Labs ( http://www-unix.mcs.anl.gov/mpi/mpich/ ). Once MPI is installed, define the environment variable MPICH_HOME to the top level MPICH2 directory. See the Parallel DOCK section for execution information.WARNING: The parallel configuration files have been tailored to a typical MPICH2 build. Linking problems, such as undefined references and cannot find libbla_bla, can occur due to idiosyncrasies in the MPI installation. One corrective approach is to use manual linking; add to the LIBS definition in config.h the link flags (-L and -l) from the command: $MPICH_HOME/mpicc -show; in general, the LIBS should contain those link flags in the same order.
(ii) DOCK with PB/SA scoring requires the OpenEye ZAP library. The ZAP library can be obtained from OpenEye (http://www.eyesopen.com/). Once ZAP is installed, define the environment variable ZAP_HOME to the directory that contains the ZAP library.
Version 6.0
The new features of DOCK 6 include: additional scoring options during minimization; DOCK 3.5 scoring-including Delphi electrostatics, ligand conformational entropy corrections, ligand desolvation, receptor desolvation; Hawkins-Cramer-Truhlar GB/SA solvation scoring with optional salt screening; PB/SA solvation scoring; AMBER scoring-including receptor flexibility; the full AMBER molecular mechanics scoring function with implicit solvent; conjugate gradient minimization and molecular dynamics simulation capabilities.
Version 6.1
The newly added features for this incremental release of DOCK 6 include a new pruning algorithm during the anchor-and-grow algorithm, a distance-based movable region and a mildly performance optimized nothing movable region for AMBER score, cleaner output and more complete output files for AMBER score, the ability to perform ranking and/or clustering on ligands between primary and secondary scoring, and more dynamic output when secondary scoring is employed.
Version 6.2
The newly added features for this incremental release of DOCK 6 include greater control over the output of conformations, improved memory efficiency for grid reading, a distance dependent dielectric control for continuous score, and for AMBER score better error reporting and robustness of the preparation scripts, a metal ions library, a cofactor library, a hook for a user library, support for RNA receptors, a minimization convergence criterion control, and the ability to skip inadequately prepped ligands.
1.5. Overview of the DOCK Suite of Programs
1.5.1. Programs
The relationship between the main programs in the dock suite is depicted in Figure 1. These routines will be described below.
Main programs in DOCK suite
The program sphgen identifies the active site, and other sites of interest, and generates the sphere centers that fill the site. It has been described in the original paper (Kuntz et al. J. Mol. Biol. 1982). The program grid generates the scoring grids (Shoichet et al. J. Comp. Chem. 1992 and Meng et al. J. Comp. Chem. 1992). Within the DOCK suite of programs, the program DOCK matches spheres (generated by sphgen) with ligand atoms and uses scoring grids (from grid) to evaluate ligand orientations (Kuntz et al. J. Mol. Biol. 1982 and Shoichet et al. J. Comp. Chem. 1992). Program DOCK also minimizes energy based scores (Meng et al. Proteins. 1993).
1.5.2. General Concepts
The DOCK suite of programs is designed to find favorable orientations of a ligand in a “receptor.” It can be subdivided into
(i) those programs related directly to docking of ligands and
(ii) accessory programs.We limit the discussion in this section to only those programs and methods related to docking a ligand in a receptor. A typical receptor might be an enzyme with a well-defined active site, though any macromolecule may be used (e.g. a structural protein, a nucleic acid strand, a “true” receptor). We’ll use an enzyme as an example in the rest of this discussion.
The starting point of all docking calculations is generally the crystal or NMR structure of an enzyme from an enzyme-ligand complex. The ligand structure may be taken from the crystal structure of the enzyme-ligand complex or from a database of compounds, such as the ZINC database (Irwin, et. al. J. Chem. Inf. Model. 2005). The primary consideration in the design of our docking programs has been to develop methods which are both rapid and reasonably accurate. These programs can be separated functionally into roughly two parts, each somewhat independent of the other:
(i) Routines which determine the orientation of a ligand relative to the receptor and
(ii) Routines which evaluate (score) a ligand orientation.There is a lot of flexibility. You can generate orientations outside of DOCK and score them with the DOCK evaluation functions. Alternatively, you can develop your own scoring routines to replace the functions supplied with DOCK.
The ligand orientation in a receptor site is broken down into a series of steps, in different programs. First, a potential site of interest on the receptor is identified. (Often, the active site is the site of interest and is known a priori.) Within this site, points are identified where ligand atoms may be located. A routine from the DOCK suite of programs identifies these points, called sphere centers, by generating a set of overlapping spheres which fill the site. Rather than using DOCK to generate these sphere centers, important positions within the active site may be identified by some other mechanism and used by DOCK as sphere centers. For example, the positions of atoms from the bound ligand may be used as these sphere centers. Or, a grid may be generated within the site and each grid point may be considered as a sphere center. Our sphere centers, however, attempt to capture shape characteristics of the active site (or site of interest) with a minimum number of points and without the bias of previously known ligand binding modes.
To orient a ligand within the active site, some of the sphere centers are “matched” with ligand atoms. That is, a sphere center is “paired” with an ligand atom. Many sets of these atom-sphere pairs are generated, each set containing only a small number of sphere-atom pairs. In order to limit the number of possible sets of atom-sphere pairs, a longest distance heuristic is used; (long) inter-sphere distances are roughly equal to the corresponding (long) inter-atomic ligand distances. A set of atom-sphere pairs is used to calculate an orientation of the ligand within the site of interest. The set of sphere-atom pairs which are used to generate an orientation is often referred to as a match. The translation vector and rotation matrix which minimizes the rmsd of (transformed) ligand atoms and matching sphere centers of the sphere-atom set are calculated and used to orient the entire ligand within the active site.
The orientation of the ligand is evaluated with a shape scoring function and/or a function approximating the ligand-enzyme binding energy. Most evaluations are done on (scoring) grids in order to minimize the overall computational time. At each grid point, the enzyme contributions to the score are stored. That is, receptor contributions to the score, potentially repetitive and time consuming, are calculated only once; the appropriate terms are then simply fetched from memory.
The ligand-enzyme binding energy is taken to be approximately the sum of the van der Waal attractive, van der Waal dispersive, and Coulombic electrostatic energies. Approximations are made to the usual molecular mechanics attractive and dispersive terms for use on a grid. To generate the energy score, the ligand atom terms are combined with the receptor terms from the nearest grid point, or combined with receptor terms from a “virtual” grid point with interpolated receptor values. The score is the sum of over all ligand atoms for these combined terms. In this case, the energy score is determined by both ligand atom types and ligand atom positions on the energy grids.
As a final step, in the energy scoring scheme, the orientation of the ligand may be varied slightly to minimize the energy score. That is, after the initial orientation and evaluation (scoring) of the ligand, a simplex minimization is used to locate the nearest local energy minimum. The sphere centers themselves are simply approximations to possible atom locations; the orientations generated by the sphere-atom pairing, although reasonable, may not be minimal in energy.
1.5.3. Specific Concepts
(A) Sphere Centers
Spheres are generated to fill the target site. The sphere centers are putative ligand atom positions. Their use is an attempt to limit the enormous number of possible orientations within the active site. Like ligand atoms, these spheres touch the surface of the molecule and do not intersect the molecule. The spheres are allowed to intersect other spheres; i.e., they have volumes which overlap. Each sphere is represented by the coordinates of its center and its radius. Only the coordinates of the sphere centers are used to orient ligands within the active site (see above). Sphere radii are used in clustering.
The number of orientations of the ligand in free space is vast. The number of orientations possible from all sets of sphere-atom pairings is smaller but still large and cannot be generated and evaluated (scored) in a reasonable length of time. Consequently, various filters are used to eliminate from consideration, before evaluation, sets of sphere-atoms pairs, which will generate poorly scoring orientations. That is, only a small subset of the number of possible ligand orientations are actually generated and scored. The distance tolerance is one filter. Sphere “coloring” and identification of “critical” spheres are other filters.
Sphere-sphere distances are compared to atom-atom distances. Sets of sphere-atom pairs are generated in the following manner: sphere i is paired with atom I if and only if for every sphere j in the set and for every atom J in the set,
where dij is the distance between sphere i and sphere j, dIJ is the distance between atom I and atom J, and epsilon is a somewhat small user-defined value.
(B) Chemical Matching
DOCK spheres are generated without regard to the chemical properties of the nearby receptor atoms. Sphere “chemical matching” or “coloring” associates a chemical property to spheres and a sphere of one “color” can only be matched with a ligand atom of complementary color. These chemical properties may be things such as “hydrogen-bond donor,” “hydrogen-bond acceptor,” “hydrophobe,” “electro-positive,” “electro-negative,” “neutral,” etc. Neither the colors themselves, nor the complementarity of the colors, are determined by the DOCK suite of programs; DOCK simply uses these labels. With the inclusion of coloring, only ligand atoms with the appropriate chemical properties are matched to the complementary colored spheres. It is probably more likely, then, that the orientation generated will produce a favorable score. Conversely, by excluding colored spheres from pairing with certain ligand atoms, the number of (probably) unfavorable orientations which are generated and evaluated can be reduced. Note that requiring complementarity in matching does not mean that all ligand atoms will lie in chemically complementary regions of the enzyme. Rather, only those ligand atoms, when paired with a colored sphere which is part of the sphere-atom match, will be guaranteed to be in the chemically complementary region of the enzyme (provided chirality of the spheres is the same as that of the matching ligand atoms).
(C) Critical Points
The "critical point" filter requires that certain spheres be part of the set of sphere-atom pairs used to orient the ligand (DesJarlais et al. J. Comput-Aided Molec. Design. 1994). Designating spheres as critical points forces the ligand to have at least one atom in that area of the enzyme, where that sphere is located. This filter may be useful, for example, when it is known that a ligand must occupy a particular area of an active site. This filter removes from consideration any orientation that does not guarantee at least one ligand atom in critical areas of the enzyme (provided chirality of the spheres is the same as that of the matching ligand atom).
(D) Bump Filter
After a ligand is oriented within the active site, the orientation is evaluated. In an attempt to reduce the total computational time, after the ligand is oriented in the site, it is possible to first check whether or not ligand atoms occupy space already occupied by the receptor. If too many of such “bumps” are found, then the ligand is likely to intersect the receptor even after minimization; consequently, the ligand orientation is discarded before evaluation.
This section is intended as a reference manual for the features of the DOCK Suite of Programs. It is intended to give an overview of the ideas which form the basis of the DOCK suite of programs and to detail the available user parameters. It is not intended to be a substitute for all the papers written on DOCK.
In general, this document is geared towards the experienced user and introduces new features and concepts in version 6. If you are new to DOCK, we strongly recommend you look at the tutorials on the DOCK web site at http://dock.compbio.ucsf.edu/DOCK_6/index.htm, which go into much greater practical detail.
Version 1.0/1.1
Authors: Robert Sheridan, Renee DesJarlais, Irwin Kuntz
The program DOCK is an automatic procedure for docking a molecule into a receptor site. The receptor site is characterized by centers, which may come from sphgen or any other source. The molecule being docked is characterized by ligand centers, which may be its non-hydrogen atoms or volume-filling spheres calculated in sphgen. The ligand centers and receptor centers are matched based on comparison of ligand-center/ligand-center and receptor-center/receptor-center distances. Sets of ligand centers match sets of receptor centers if all the internal distances match, within a value of distance_tolerance. Ligand-receptor pairs are added to the set until at least nodes_minimum pairs have been found. At least three pairs must be found to uniquely determine a rotation/translation matrix that will orient the ligand in the receptor site. A least-squares fitting procedure is used (Ferro et al. Act. Cryst. A. 1977). Once an orientation has been found, it is evaluated by any of several scoring functions. DOCK may be used to explore the binding modes of an individual molecule, or be used to screen a database of molecules to identify potential ligands.
Version 2.0
Authors: Brian Shoichet, Dale Bodian, Irwin Kuntz
DOCK version 2.0 was written to give the user greater control over the thoroughness of the matching procedure, and thus over the number of orientations found and the CPU time required (Shoichet et al. J. Comp. Chem. 1992). In addition, certain algorithmic shortcomings of earlier versions were overcome. Versions 2.0 and higher are particularly useful for macromolecular docking (Shoichet et al. J. Mol. Biol. 1991) and applications which demand detailed exploration of ligand binding modes. In these cases, users are encouraged to run CLUSTER in conjunction with sphgen and DOCK.
To allow for greater control over searches of orientation space, the ligand and receptor centers are pre-organized according to their internal distances. Starting with any given center, all the other centers are presorted into “bins” based on their distance to the first center. All centers are tried in turn as “first” positions, and all the points in a bin which has been chosen for matching are tried sequentially. Ligand and receptor bins are chosen for matching when they have the same distance limits from their respective “first” points. The number of centers in each bin determines how many sets of points in the receptor and the ligand will ultimately be compared. In general, the wider the bins, the greater the number of orientations generated. Thus, the thoroughness of the search is under user control.
Version 3.0
Authors: Elaine Meng, Brian Shoichet, Irwin Kuntz
Version 3.0 retained the matching features of version 2.0, and introduced options for scoring (Meng et al. J. Comp. Chem., 1992). Besides the simple contact scores mentioned above, one can also obtain molecular mechanics interaction energies using grid files calculated by CHEMGRID (which is now superseded by GRID in version 4.0). More information about the ligand and receptor molecules is required to perform these higher-level kinds of scoring. Point charges on the receptor and ligand atoms are needed for electrostatic scoring, and atom-type information is needed for the van der Waals portion of the force field score. Input formats (some of them new in version 3.5) are discussed in various parts of the documentation; one example of a “complete format” (including point charges and atom type information) is SYBYL MOL2 format. Parameterization of the receptor is discussed in the documentation for CHEMGRID. In DOCK, ligand parameters are read in along with the coordinates; input formats are described below. Currently, the options are: contact scoring only, contact scoring plus Delphi electrostatic scoring, and contact scoring plus force field scoring. Atom-type information and point charges are not required for contact scoring only.
Version 3.5
Authors: Mike Connolly, Daniel Gschwend, Andy Good, Connie Oshiro, Irwin Kuntz
Version 3.5 added several features: score optimization, degeneracy checking, chemical matching and critical clustering.
Version 4.0
Authors: Todd Ewing, Irwin Kuntz
Version 4.0 was a major rewrite and update of DOCK (Ewing et al. 2001 ). A new matching engine was developed which is more robust, efficient, and easier to use (Ewing and Kuntz. J. Comput. Chem. 1997). Orientational sampling can now be controlled directly by specifying the number of desired orientations. Additional features include chemical scoring, chemical screening, and ligand flexibility.
Version 5.0-5.4
Authors: Demetri Moustakas, P. Therese Lang, Scott Pegg, Scott Brozell, Irwin Kuntz
Version 5 was rewritten in C++ in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools (Moustakas et al., 2006). Additional new features include MPI parallelization, exhaustive orientation searching, improved conformation searching, GB/SA solvation scoring, and post-screening pose clustering. (Zou et al. J. Am. Chem. Soc., 1999)
Version 6.0-6.2
DOCK 6 is an extension of the DOCK 5 code base. It includes the implementation of Hawkins-Cramer-Truhlar GB/SA solvation scoring with salt screening and PB/SA solvation scoring through OpenEye's Zap Library. Additional flexibility has been added to scoring options during minimization. The new code also incorporates DOCK version 3.5.54 scoring features like Delphi electrostatics, ligand desolvation, and receptor desolvation. Finally, DOCK 6 introduces new code that allows access to the NAB library of functions such as receptor flexibility, the full AMBER molecular mechanics scoring function with implicit solvent, conjugate gradient minimization, and molecular dynamics simulation capabilities.
DOCK must be run from the command line in a standard unix shell. It reads an input parameter file containing field/value pairs:
USAGE: dock6 -i dock.in [-o dock.out] [-v]
DESCRIPTION:
DOCK may be executed in either interactive or batch mode, depending on whether the output is written to a file. In interactive mode, the user is requested only for parameters relevant to the particular run and default values are provided. This mode is recommended for the initial construction of the input file and for short calculations. In batch mode, input parameters are read in from the input file and all output is written to the output file. This mode is recommended for long calculations once an input file has been generated interactively.OPTIONS
-i dock.in #input file containing user-defined parameters
-help #emit the usage statement.
-v #verbosity flag that prints additional information and warnings for scoring functions
-o dock.out #output file containing the parameters used in the calculation, summary information for each molecule docked, and all warning messagesInteractive mode
USAGE: dock6 -i dock.in
DESCRIPTION:
When launched this way, DOCK will extract all relevant parameters from dock.in (or any file supplied by the user). If additional parameters are needed (or if the dock.infile is non-existent or empty), DOCK will request them one at a time from the user. Reasonable default values are presented. Any parameters supplied by the user will be automatically appended to the dock.in file. If the user would like to change any previously entered values, the user can edit the dock.in file using a text editor.Batch mode
USAGE: dock6 -i dock.in -o dock.out
DESCRIPTION:
When launched in this way, DOCK will run in batch mode, extracting all relevant parameters from dock.in (or any file supplied by the user) and will write out all output to dock.out (or any file supplied by the user). If any parameters are missing or incorrect, then execution will halt and an appropriate error message will be reported in dock.out.Parallel DOCK
USAGE: mpirun [-machinefile machfile] [-np #_of_proc] dock6.mpi -i dock.in -o dock.out [-v]
DESCRIPTION:
If DOCK has been built for parallel processing (see Installation) then DOCK can be run in parallel. Parallelization employs a single master processor with the remaining processors acting as slaves. If np = 1, the code defaults to non-MPI behavior. There is a minimal difference in performance between 1 and 2 processors. Improved performance is only evident with more than 2 processors.ADDITIONAL OPTIONS:
-machinefile #simple text file containing the names of the computers (nodes) to be used
-np # specifies the number of processors which typically is the same as the number of lines in the machinefileMPICH2 QUICK START:
MPICH2, unlike MPICH, requires command line initialization. If your system administrators have not initialized MPICH2 then follow these steps:
Create a .mpd.conf file in your home directory containing secretword=mysecretword.
Create a .mpd.hosts file in your home directory containing a list of machine names, one per line.
Start the MPICH2 daemons by executing mpdboot.
Verify the start up of the MPICH2 daemons by executing mpdtrace -l.
Finally execute DOCK using mpiexec or mpirun. For further information see the MPICH2 README.PB/SA DOCK
USAGE: dock6.pbsa -i dock.in [-o dock.out] [-v]
DESCRIPTION:
If you have compiled DOCK for use with the ZAP library (see Installation), DOCK can be run using the ZAP PB/SA scoring function.
In Interactive Mode, dock will dynamically ask the user to enter the appropriate user parameters. The generic format for the questions is:
parameter_name [default value] (legal values):
The parameter parser requires that the values entered for a parameter exactly match one of the legal values. For example:
Example A: program_location [Hello_World!] ():
Example B: #_red_balloons [99] ():
Example C: glass_status [half_full] (half_full half_empty):
In Example A, the parameter "program_location" can be assigned any string value, and in Example B, the parameter "#_red_balloons" can be assigned any integer value. However, in Example C, the parameter value "glass_status" can only be assigned the strings "half_full" or "half_empty". If no parameter are assigned by the user, the default value--in brackets--will be used.
In Batch Mode, all parameters in the dock.in file, must be:
parameter_name valueNote that the parameter_name and corresponding value need to be separated by white space or a tab.
Before you can dock a ligand, you will need atom types and charges for every atom in the ligand. Currently, DOCK only reads the Tripos MOL2 format. For a single ligand (or several ligands), you can use Chimera in combination with antechamber to prepare a MOL2 file for the ligand (see Structure Preparation Tutorial) or various other visualization packages. During the docking procedure, ligands are read in from a single MOL2 or multi-MOL2 file. Atom and bond types are assigned using the DOCK 4 atom/bond typing parameter files.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Molecule Library Parameters
- ligand_atom_file [database.mol2] (string):
# The ligand input filename- limit_max_ligands [no] (yes, no):
#Limit the number of ligands to be read in from a library
- max_ligands[1000]:
#maximum number of ligands that will be read in from a library- skip_molecule [no] (yes, no):
#Skip some number of molecules at the beginning of a library
- initial_skip [0] (int):
# The number of molecules to skip over at the beginning of a library- read_mol_solvation [no] (yes, no):
#Flag to read atomic desolvation information from ligand file- calculate_rmsd [no] (yes, no):
#Flag to calculate the heavy atom RMSD between the final ligand pose
#and its initial structure.
- use_rmsd_reference_mol [no] (yes, no):
#Specify alternative geometric location for reference structure for RMSD
#calculation- rmsd_reference_filename [ligand_rmsd.mol2] (string):
#File containing alternative geometric location for reference structure for
#RMSD calculation
The rigid body orienting code is written as a direct implementation of the isomorphous subgraph matching method of Crippen and Kuhl (Kuhl et al. J. Comput. Chem. 1984). All receptor sphere pairs and atom center pairs are considered for inclusion in a matching clique. This is more computationally demanding than the clique matching algorithm implemented in previous versions that used a distance binning algorithm to restrict the clique search, in which pairs of spheres and atom centers were binned by distance. Only sphere pairs and center pairs that were within the same distance bin were considered as potential matches (Ewing and Kuntz. J. Comput. Chem. 1997).
The clique matching implementation avoids bin boundaries that prevent some receptor sphere and ligand atom pairs from matching, and, as a result, it can find good matches missed by previous versions of DOCK. The rigid body rotation code has also been corrected to avoid a singularity that occurred if the spheres in the match lay within the same plane.
There are two types of ligand orientation currently available:
(1) Automated Matching —Specify the number of orientations, and DOCK will generate matches until enough orientations passing the bump filter have been formed. Matches are formed best first, with respect to the difference in the ligand and site point internal distances.
(2) Manual Matching —Specify the distance and node parameters, and DOCK will generate all the matches which satisfy them. The number of orientations scored is equal to the total matches minus the orientations discarded by the user applied filters.Multiple orientations may be written out for each molecule using the write_orientations parameter (see Ligand File Output), otherwise only the best orientation is recorded. Note that this feature is available only in serial DOCK; in parallel DOCK only the best orientation is emitted.
The critical_points feature is used to focus the orientation search into a subsite of the receptor active site (DesJarlais et al. J. Comput-Aided Molec. Design. 1994 and Miller et al. J. Comput. Aided Mol. Design. 1994). For example, identifying molecules that interact with catalytic residues might be of chief interest. Any number of points may be identified as critical (see Critical Points in the sphgen documentation for information on labeling spheres), and any number of groupings of these points may be identified. An alternative to using critical points is to discard all site points that are some distance away from the subsite of interest, while retaining enough site points to define unique ligand orientations. This feature can be highly effective at reducing matching by five-fold or more. It is particularly useful to also assign chemical labels to the critical points to further focus sampling.
The chemical_matching feature is used to incorporate information about the chemical complementarity of a ligand orientation into the matching process. In this feature, chemical labels are assigned to site points (see Chemical Matching in the sphgen documentation for information on labeling spheres) and ligand atoms (see Ligand File Input) (Kuhl et al. J. Comput. Chem. 1984). The site point labels are based on the local receptor environment. The ligand atom labels are based on user-adjustable chemical functionality rules. These labeling rules are identified with the chemical_defn_file parameter and reside in an editable file (see chem.defn). A node in a match will produce an unfavorable interaction if the atom and site point components have labels which violate a chemical match rule. The chemical matching rules are identified with the chemical_match_file parameter and reside in an editable file (see chem_match.tbl). If a match will produce unfavorable interactions, then the match is discarded. The speed-up from this technique depends how extensively site points have been labeled and the stringency of the match rules, but an improvement of two-fold or more can be expected.
Although DOCK is typically used to process small ligand molecules, it can be used to study the interactions of macromolecular eigands. The chief difference in protocol is that to use the match_receptor_sites procedure for the orientation search, special ligand centers must be used to represent the ligand. This is signaled by setting the ligand_centers parameter. The ligand centers may be constructed by sphgen and must reside in a file identified with the ligand_center_file parameter. See Shoichet et al. J. Mol. Biol. 1991 for examples and discussion of macromolecular docking.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Orient Ligand Parameters
- orient_ligand [yes] (yes, no):
#Flag to orient ligand to spheres- automated_matching [yes] (yes, no):
# Flag to perform automated matching instead of manual matching
- (If automated_matching = no, then manual_matching is used)
distance_tolerence [0.25] (float):
#The tolerance in angstroms within which a pair of spheres is considered equivalent to
#a pair of centers- distance_minimum [2.0] (float):
#The shortest distance allowed between 2 spheres (any sphere pair with a shorter
#distance is disregarded)- nodes_minimum [3] (int):
#The minimum number of nodes in a clique- nodes_maximum [10] (int):
#The maximum number of nodes in a clique- receptor_site_file [receptor.sph] (string):
# The file containing the receptor spheres- max_orientations [500] (int):
# The maximum number of orientations that will be cycled through- critical_points [no] (yes, no):
#Flag to use critical point sphere labeling to target orientations to particular spheres- chemical_matching [no] (yes, no):
#Flag to use chemical coloring of spheres to match chemical labels on ligand atoms
- chem_match_tbl [chem_match.tbl] (string):
#File defining the legal chemical type matches/pairings- use_ligand_spheres [no] (yes/no):
#Flag to enable a sphere file representing ligand heavy atoms to be used to orient the ligand
#Typically used for macromolecular docking
- ligand_sphere_file [ligand.sph] (string):
#The file containing the receptor spheres
The internal degrees of freedom of the ligand can be sampled with the anchor-and-grow incremental construction approach. This conformational search algorithm has been validated for binding mode prediction on sets of ligands that have no more than seven rotatable bonds (Moustakas et al., 2006).
The process of docking a molecule using the anchor-first strategy is shown in the Workflow for Anchor-and-Grow Algorithm. First, the largest rigid substructure of the ligand (anchor) is identified (see Identification of Rigid Segments) and rigidly oriented in the active site (orientation) by matching its heavy atoms centers to the receptor sphere centers (see Orienting the Ligand). The anchor orientations are evaluated and optimized using the scoring function (see Scoring) and the energy minimizer (see Minimization). The orientations are then ranked according to their score, spatially clustered by heavy atom root mean squared deviation (RMSD), and pruned (see Pruning the Conformation Search Tree). Next, the remaining flexible portion of the ligand (see Identification of Flexible Layers) is built onto the best anchor orientations within the context of the receptor (grow). It is assumed that the shape of the binding site will help restrict the sampling of ligand conformations to those that are most relevant for the receptor geometry.
Workflow for Anchor-and-Grow Algorithm
The conformation of a flexible molecule may be searched or relaxed using the flexible_ligand option. Only the torsion angles are modified, not the bond lengths or angles. Therefore, the input geometry of the molecule needs to be of good quality. A structure generated by ZINC is sufficient.
The torsion angle positions reside in an editable file (see flex_drive.tbl on page 111) which is identified with the flex_drive_file parameter. Internal clashes are detected during the torsion drive search based on the clash_overlap or internal_energy parameters, which are independent of scoring function.
2.7.2. Identification of Rigid Segments
A flexible molecule is treated as a collection of rigid segments. Each segment contains the largest set of adjacent atoms separated by non-rotatable bonds. Segments are separated by rotatable bonds.
The first step in segmentation is ring identification. All bonds within molecular rings are treated as rigid. This classification scheme is a first-order approximation of molecular flexibility, since some amount of flexibility can exist in non-aromatic rings. To treat such phenomenon as sugar puckering and chair-boat hexane conformations, the user will need to supply each ring conformation as a separate input molecule. Additional bonds may be specified as rigid by the user (see Manual Specification of Non-rotatable Bonds).
Identification of Rigid Anchor and Flexible Bonds
The second step is flexible bond identification. Each flexible bond is associated with a label defined in an editable file (see flex.defn). The parameter file is identified with the flex_definition_file parameter. Each label in the file contains a definition based on the atom types (and chemical environment) of the bonded atoms. Each label is also flagged as minimizable. Typically, bonds with some degree of double bond character are excluded from minimization so that planarity is preserved. Each label is also associated with a set of preferred torsion positions. The location of each flexible bond is used to partition the molecule into rigid segments. A segment is the largest local set of atoms that contains only non-flexible bonds.
2.7.3. Manual Specification of Non-rotatable Bonds
Currently this functionality is not available!
The user can specify additional bonds to be non-rotatable, to supplement the ring bonds automatically identified by DOCK. Such a technique could be used to preserve the conformation of part of a molecule and isolate it from the conformation search. Non-rotatable bonds are identified in the Tripos MOL2 format file containing the molecule. The bonds are designated as members of a STATIC BOND SET named RIGID (see Tripos MOL2 Format).
Creation of the RIGID set can be done within Chimera. With the molecule of interest loaded into Chimera, select the portion of the ligand you would like to remain rigid. Then select on File > Save MOL2. Make sure the "Write current selection to @ SETS section of file" is checked and save the file.
Alternatively, the RIGID set can be entered into the MOL2 file by hand. To do this, go to the end of the MOL2 file. If no sets currently exist, then add a SET identifier on a new line. It should contain the text "@<TRIPOS>SET". On a new line add the text "RIGID STATIC BONDS <user> **** Comment". On the next line enter the number of bonds that will be included in the set, followed by the numerical identifier of each bond in the set.
2.7.4. Identification of Flexible Layers
An anchor segment is selected from the rigid segments in an automatic fashion (see Manual Specification of Non-rotatable Bonds to override this behavior). The molecule is divided into segments that overlap at each rotatable bond. The segment with the largest number of heavy atoms is selected as the anchor. All segments with more heavy atoms then min_anchor_size are tried separately as anchors.
Identification of Overlapping Segments
When an anchor has been selected, then the molecule is redivided into non-overlapping segments, which are then arranged concentrically about the anchor segment. Segments are reattached to the anchor according to the innermost layer first and within a layer to the largest segment first.
Layered Non-Overlapping Segments
The anchor is processed separately (either oriented, scored, and/or minimized). The remaining segments are subsequently re-attached during the conformation search. The interaction energy between the receptor and the ligand can be optimized with a simplex minimizer (see Minimization).
2.7.5. Pruning the Conformation Search Tree
There are now two available methods for pruning available. In the first, the pruning attempts to retain the best, most diverse configurations using a top-first pruning method, which proceeds as follows. The configurations are ranked according to score. The top-ranked configuration is set aside and used as a reference configuration for the first round of pruning. All remaining configurations are considered candidates for removal. A root-mean-squared distance (RMSD) between each candidate and the reference configuration is computed. Each candidate is then evaluated for removal based on its rank and RMSD using the inequality shown in Equation 2. If the factor is greater than number_confs_for_next_growth, as appropriate, the candidate is removed.
![]()
Based on this factor, a configuration with rank 2 and 0.2 Angstroms RMSD is comparable to a configuration with rank 20 and 2.0 Angstroms RMSD. The next best scoring configuration which survives the first pass of removal is then set aside and used as a reference configuration for the second round of pruning, and so on. The pruning method biases its search time towards molecules which sample a more diverse set of binding modes. As the value of num_anchors_orients_for_growth and number_confs_for_next_growth is increased, the anchor-first method approaches an exhaustive search.
In the second method, the goal is to bias the sampling toward conformations that are close to the correct binding mode (as optimized using a test set of experimentally solved structures). Much as the method above, the algorithm ranks the generated poses and conformations. Then, all poses that violate a user-defined score cutoff are removed. To facilitate the speed of the calculation, the remaining list is additionally pared back to a user-defined length. In this technique, the sampling is driven towards molecules that sample closer to the experimentally determined binding site, but results in a significantly less diverse set of final poses.
2.7.6. Internal Energy Calculation
During the growth phase of the calculation, an internal energy scoring function can be used. This function computes the Lennard-Jones and Coulombic energy between all ligand atom pairs, excluding all 1-2, 1-3, and 1-4 pairs. It reduces the occurrence of internal clashes during the torsional optimization. This energy is not included in the final reported score.
The time demand grows linearly with the num_anchors_orients_for_growth, the number_confs_for_next_growth, the number of flexible bonds and the number of torsion positions per bond, as well as the number of anchor segments explored for a given molecule. Using the notation in the Workflow for Anchor-and-Grow Algorithm, the time demand can be expressed as
where the additional terms are:
NA is the number of anchor segments tried per molecule.
NB is the number of rotatable bonds per molecule.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Flexible Ligand Parameters
- flexible_ligand [yes] (yes, no):
#Flag to perform ligand conformational searching
- min_anchor_size [40] (int):
#The minimum number of heavy atoms for an anchor segment- pruning_use_clustering [yes] (yes, no):
#Flag to enable clustering during pruning (duplicates previous behavior)(if pruning_use_clustering = yes)
- pruning_max_orients [100] (int):
#The pruning value cutoff for anchor orientations promoted to the
#conformational search (previously num_anchor_orients_for_growth)- pruning_clustering_cutoff [100] (int):
#The maximum number of conformations carried forward in the anchor & grow
#search (previous num_confs_for_next_growth)(if pruning_use_clustering = no)
- pruning_max_orients [100] (int):
#Maximum number of anchor orientations promoted to the conformational
#search- pruning_orient_score_cutoff [25.0] (float):
#Maximum score for anchor after minimization- pruning_max_conformers [75] (int):
#Maximum number of anchor orientations promoted to the next layer of growth- pruning_conformer_score_cutoff [25.0] (float):
#Maximum score for conformation after minimization- use_internal_energy [yes] (yes,no):
#Flag to add an internal energy term to the score during the conformational search
- internal_energy_att_exp [6] (int):
#VDW attractive exponent- internal_energy_rep_exp [12] (int):
#VDW repulsive exponent- internal_energy_dielectric [4.0] (float):
#Dielectric used for electrostatic calculation- use_clash_overlap [no] (yes, no):
#Flag to check for overlapping atomic volumes during anchor and grow
- clash_overlap [0.5] (float):
#The relative threshold for overlapping atomic volumes;
#a clash exists if the distance between a pair of atoms is less than
#the clash_overlap times the sum of their atom type radii;
#thus, a clash_overlap of 0.75 allows 25% (1 - 0.75) of relative overlap.
DOCK uses several types of scoring functions to discriminate among orientations and molecules. Scoring is requested using the score_molecules parameter. The scoring functions are implemented with a hierarchical strategy. A master score class manages all scoring functions that DOCK uses. Any of the DOCK scoring functions can be selected as the primary and/or the secondary scoring function. The primary scoring function is used during rigid orienting, anchor-and-grow steps, and minimization, which typically make many calls to the scoring function. The secondary scoring function is used in the final minimization, scoring, and ranking of the molecules.
Orientations and grow steps may be filtered prior to scoring to discard those in which the molecule significantly overlaps receptor atoms. This feature is enabled with the bump_filter flag. At the time of construction of the bump filter, the amount of atom VDW overlap is defined with the bump_overlap parameter (see Grid). At the time of bump evaluation the number of allowed bumps is defined with the max_bump_anchor and max_bump_growth parameter.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Bump Filter Parameters
- bump_filter [yes] (yes, no):
#Flag to perform bump filtering- bump_grid_prefix [grid] (string):
#The prefix to the grid file containing the desired bump grid- max_bumps_anchor [12] (int):
#The maximum allowed number of bumps for a molecule to pass #the filter- max_bumps_growth [12] (int):
#The maximum allowed number of bumps for a molecule to pass #the filter
The contact score is a simple summation of the number of heavy atom contacts between the ligand and receptor. At the time of construction of the contact scoring grid, the distance threshold defining a contact is set with the contact_cutoff_distance (see Grid). Atom VDW overlaps are penalized by checking the bump filter grid, or with the contact_clash_overlap parameter for the intramolecular score. The amount of penalty is specified with the contact_clash_penalty parameter.
The contact score provides a simple assessment of shape complementarity. It can be useful for evaluating primarily non-polar interactions.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Contact Score Parameters
- contact_score_primary [no] (yes, no):
#Flag to perform contact scoring as the primary scoring function- contact_score_secondary [no] (yes, no):
#Flag to perform contact scoring as the secondary scoring function
- contact_score_cutoff_distance [4.5] (float):
#The distance threshold defining a contact
- contact_score_clash_overlap [0.75] (float):
#Contact definition for use with intramolecular scoring- contact_score_clash_penalty [50] (float):
#The penalty for each contact overlap made- contact_score_grid_prefix [grid] (string):
#The prefix to the grid files containing the desired contact
#gridThe grid-based score is based on the non-bonded terms of the molecular mechanic force field (see Grid for more background).
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Grid Score Parameters
- grid_score_primary [yes] (yes, no):
# Flag to perform grid-based energy scoring as the primary scoring
#function- grid_score_secondary [yes] (yes, no):
# Flag to perform grid-based energy scoring as the secondary #scoring function
- grid_score_rep_rad_scale [1] (float):
#Scalar multiplier of the radii for the repulsive portion of the
#VDW energy component ONLY
- grid_score_vdw_scale [1] (float):
# Scalar multiplier of the VDW energy component
- (if grid_score_vdw_scale = 0)
grid_score_turn_off_vdw [yes] (yes, no):
#Flag to turn of vdw portion of scoring function- grid_score_es_scale [1] (float):
# The prefix to the grid files containing the desired energy
#grid
- (if grid_score_es_scale = 0)
grid_score_turn_off_es [yes] (yes, no):
#Flag to turn of es portion of scoring function- grid_score_grid_prefix [grid] (string):
#The prefix to the grid files containing the desired nrg grid
DOCK3.5 score is a variant of Grid-based scoring (see Grid). DOCK3.5 score function calculates ligand desolvation in addition to steric and electrostatic interactions between the ligand and receptor. The electrostatic interactions between the ligand and the protein is calculated from a electrostatic potential (ESP) map. The ESP map should be calculated using finite difference Poisson-Boltzmann equation (PBE) as implemented in the program DelPhi. We provide scripts for the calculation, however DelPhi is not distributed by us.
In the DOCK scoring hierarchy DOCK3.5 Score follows GridScore and can be used both as the primary and the secondary scoring function.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
DOCK3.5 Score Parameters
- dock3.5_score_primary [no] (yes, no):
#Flag to perform dock3.5 scoring as the primary scoring function- dock3.5_score_secondary [no] #Flag to perform dock3.5 scoring as the secondary scoring
#function
- dock3.5_vdw_score [yes] (yes, no):
#Calculate steric interaction from dock3.5 score- dock3.5_grd_prefix [chem52] (string):
#Default prefix for files containing dock3.5 grids- dock3.5_electrostatic_score [yes] (yes, no):
#Calculate electrostatic interaction from ESP grid calculated using DelPhi- dock3.5_ligand_internal_energy [no] (yes, no):
#Flag to add ligand internal energy to the scoring function- dock3.5_ligand_desolvation_score [volume] (volume, total, no):
#Calculate total or volume based ligand desolvation from solvation
#grids
- dock3.5_solvent_occlusion_file [solvmap] (string):
#Occluded solvent grid of the receptor- dock3.5_redistribute_positive_desolvation [no] (yes, no):
#positive partial atomic desolvation penalties are distributed- dock3.5_write_atomic_energy_contrib[no] (yes, no):
#write contribution from each atom to total score- dock3.5_score_vdw_scale [1.0] (float):
#Scalar multiplier of VDW energy component- dock3.5_score_es_scale [1.0] (float):
#Scalar multiplier of electrostatic energy component
Continuous scoring may be used to evaluate a ligand:receptor complex without the investment of a grid calculation, or to perform a more detailed calculation without the numerical approximation of the grid. See Meng et al. J. Comp. Chem. 1992. When continuous scoring is requested, then score parameters normally supplied to Grid, must also be supplied to DOCK. It is left to the user to make sure consistent values are supplied to both programs.
The distance dependence of the Lennard-Jones function is set with the cont_score_att_exp and cont_score_rep_exp parameters. Typically a 6-12 potential is used, but it can be softened up by using a 6-8 or 6-9 potential. The source of the radii and well-depths is the same as for other atom typing applications; see vdw.defn. The distance dependence of the Coulombic function is set with the cont_score_use_dist_dep_dielectric parameter. The dielectric constant is adjusted with the cont_score_dielectric parameter. Note the correspondence of these parameters to those for Grid Energy Scoring.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Continuous Score Parameters
- continuous_score_primary [no] (yes, no):
#Flag to perform continuous non-grid based scoring as the primary
#scoring function- continuous_score_secondary [no] (yes, no):
#Flag to perform continuous non-grid based scoring as the
#secondary scoring function
- cont_score_rec_filename [receptor.mol2] (string):
#File that contains receptor coordinates- cont_score_att_exp [6] (int):
#VDW Lennard-Jones potential attractive exponent- cont_score_rep_exp [12] (int):
#VDW Lennard-Jones potential repulsive exponent- cont_score_rep_rad_scale [1] (int):
#Scalar multiplier of the radii for the repulsive portion of the
#VDW energy component ONLY- cont_score_use_dist_dep_dielectric [yes] (yes, no):
#Distance dependent dielectric switch- cont_score_dielectric [] (float):
#Dielectric constant for electrostatic term
- (if cont_score_use_dist_dep_dielectric = yes)
cont_score_dielectric [4.0] (float):
- (if cont_score_use_dist_dep_dielectric = no)
cont_score_dielectric [1.0] (float):
- cont_score_vdw_scale [1] (float):
#Scalar multiplier of vdw energy component
- (if cont_score_vdw_scale = 0)
cont_score_turn_off_vdw [yes] (yes, no):
#Flag to turn of vdw portion of scoring function- cont_score_es_scale [1] (float):
#Scalar multiplier of electrostatic energy component
- (if cont_score_es_scale = 0)
cont_score_turn_off_es [yes] (yes, no):
#Flag to turn of es portion of scoring function
The Zou GB/SA scoring function is a fast algorithm, the pairwise free energy model, for ligand binding affinity calculations. Specifically, a pairwise descreening approximation (Hawkins et al. Chem. Phys. Lett. 1995) is used in calculations of the electrostatic energy contribution. The algorithm also includes a procedure to account for the low dielectric region that might form between the ligand and the receptor during docking processes. It has been tested to obtain similar results compared with the grid-based free energy model but with much less computation efforts.
For more information on the scoring function and specifics of the implementation, see Liu et al. J. Phys. Chem. B. 2004.
Zou GB/SA Score Parameters
- gbsa_zou_score_primary [no] (yes, no):
#Flag to perform Zou GB/SA scoring as the primary scoring
#function- gbsa_zou_score_secondary [no] (yes, no):
#Flag to perform Zou GB/SA scoring as the secondary scoring
#function
- gbsa_zou_gb_grid_prefix [gb_grid] (string):
# The path to the pairwise GB grids- gbsa_zou_sa_grid_prefix [sa_grid] (string):
#The path to the SA grids- gbsa_zou_vdw_grid_prefix [grid] (string):
#The path to the nrg grids, used for the vdw portion of the GB/SA calculation- gbsa_zou_screen_file [screen.in] (string):
# GB parameter file for electrostatic screening. Its located in the parameters dir by default- gbsa_zou_solvent_dielectric [78.300003] (float):
# The value for the solvent dielectric
The Hawkins GB/SA score is an implementation of the Molecular Mechanics Generalized Born Surface Area (MM-GBSA) method originally described by (Srinivasan et al. J. Am. Chem. Soc. 1998) and reviewed by (Kollman et al. Acc. Chem. Res. 2000). This particular implementation of MM-GBSA uses the pairwise GB solvation model reported by Hawkins, Cramer and Truhlar (Hawkins et al. Chem. Phys. Lett. 1995 , Hawkins et al. J. Phys. Chem. 1996) with parameters described by Tsui and Case (Tsui, et al. Biopolymers 2001). Solvent Accessible Surface Areas (SA) are computed using a C++ implementation of the Amber8 "icosahedra" algorithm.
The total interaction between the ligand and receptor are represented by unscaled Coulombic and Lennard jones energy terms (MM) plus the change (delta) in solvation (GBSA) where deltaGBSA = GBSA complex - (GBSA receptor + GBSA ligand). For any given species GBSA = Gpolar ( GB energy) + Gnonpolar (SA*0.00542 + 0.92). Note that if inner and outer dielectric constants are set to 1 (gas-phase) and 80 (water-phase), then GBSA terms are formally equivalent to free energies of hydration that can be directly compared with experimental data and useful for evaluating the accuracy of different partial charge models (Rizzo et al. J. Chem. Theory. Comput. 2006).
The Hawkins GB/SA score was intended to be used for "single-point" calculations and the current implementation is quite slow if energy minimization is performed at the same time. However, an energy minimization is highly recommended prior to GBSA single-point calculations given that scores can be very sensitive to receptor-ligand geometry. Use of the -v flag will yield separate GBSA terms for each species (complex, receptor, and ligand) and also include raw icosahedra SA values in units of angstroms squared. This scoring method requires that the vdw_AMBER_parm99.defn file be specified which contains GB radii and scale parameters for each atom type.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Hawkins GB/SA Score Parameters
- gbsa_hawkins_score_primary [no] (yes, no):
#Flag to perform Hawkins GB/SA scoring as the primary scoring
#function- gbsa_hawkins_score_secondary [no] (yes, no):
#Flag to perform Hawkins GB/SA scoring as the secondary scoring
#function
- gbsa_hawkins_score_rec_filename [receptor.mol2] (string):
#File that contains receptor coordinates- gbsa_hawkins_score_solvent_dielectric [78.5] (float):
#Dielectric constant for solvent- gbsa_hawkins_score_salt_conc(M) [0.0] (float):
#Salt concentration for solvent at Molar concentration- gbsa_hawkins_score_gb_offset [0.09] (float):
#GB radius offset- gbsa_hawkins_score_cont_vdw_and_es [yes] (yes, no):
(if gbsa_hawkins_score_cont_vdw_and_es = yes)
#Flag to determine whether vdw and es values will be
#calculated continuously or from a grid
- gbsa_hawkins_score_vdw_att_exp [6] (int):
#VDW Lennard-Jones potential attractive exponent- gbsa_hawkins_score_vdw_att_exp [12] (int):
#VDW Lennard-Jones potential repulsive exponent(if gbsa_hawkins_score_cont_vdw_and_es = no)
- gbsa_hawkins_score_vdw_att_exp [6] (int):
#VDW Lennard-Jones potential attractive exponent- gbsa_hawkins_score_vdw_att_exp [12] (int):
#VDW Lennard-Jones potential repulsive exponentThe PB/SA scoring function is an implementation of the ZAP tool kit from OpenEye. In order to access the PB/SA function, you must have the ZAP tool kit installed and a valid ZAP license. For more information on obtaining the tool kit, licensing, and other OpenEye products, go to the OpenEye web page at www.eyesopen.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From the OpenEye Documentation:"A Smooth Permittivity Function for Poisson-Boltzmann Solvation Methods", J. Andrew Grant, Barry T. Pickup and Anthony Nicholls, J. Comp. Chem, Vol 22, No.6, pgs 608-640, April 2001.
ZAP is, at its heart, a Poisson-Boltzmann (PB) solver. The Poisson equation describes how electrostatic fields change in a medium of varying dielectric, such as an organic molecule in water. The Boltzmann bit adds in the effect of mobile charge, e.g. salt. PB is an effective way to simulate the effects of water in biological systems. It relies on a charge description of a molecule, the designation of low (molecular) and high (solvent) dielectric regions and a description of an ion-accessible volume and produces a grid of electrostatic potentials. From this, transfer energies between different solvents, binding energies, pka shifts, pI's, solvent forces, electrostatic descriptors, solvent dipole moments, surface potentials and dielectric focusing can all be calculated. As electrostatics is one of the two principal components of molecular interaction (the other, of course, being shape), ZAP is OpenEye's attempt to get it right.
ZAP is written in ANSI C and follows a style of object-oriented programming popularized in the chemical information world by Daylight. It encapsulates structures and methods by the use of opaque pointers, or handles. These are integers converted to real pointers in the interior of the object system, hidden from the user.
ZAP is available to most academic and government institutions free of charge. It comes in the form of a linkable library and a set of prepackaged binaries compiled for SGI, Linux and Cygwin (NT). Commercial use requires a license, available at a very reasonable cost from the nice folks at OpenEye. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
PB/SA Score Parameters
- pbsa_score_primary [yes] (yes, no):
# Flag to perform pbsa energy scoring as the primary scoring
#function- pbsa_score_secondary [yes] (yes, no):
# Flag to perform pbsa energy scoring as the secondary
#scoring function
- pbsa_receptor_filename [receptor.mol2] (string):
# Name of receptor file.- pbsa_interior_dielectric [2.0] (float):
# Value for the dielectric inside the protein- pbsa_exterior_dielectric [78.5] (float):
# Value for the dielectric of the solvent (outside the protein)- pbsa_vdw_grid_prefix [grid] (string):
# The prefix to the grid files containing the VDW values
The AMBER score provides a suite of functionality including: AMBER molecular mechanics, implicit solvation, and molecular dynamics simulation, receptor flexibility, and conjugate gradient minimization. The main disadvantages are more complicated input preparation and increased computational expense.
AMBER score implements molecular mechanics implicit solvent simulations with the traditional all-atom AMBER force field (Pearlman et al. Comp. Phys. Commun. 1995) for protein atoms and the general AMBER force field (GAFF, Wang et al. J. Comp. Chem. 2004) for ligand atoms. The interaction between the ligand and the receptor is represented by electrostatic and van der Waals energy terms, and the solvation energy is calculated using a Generalized Born (GB) solvation model. The user has the option to choose one of the following GB models: (i) Hawkins, Cramer and Truhlar pairwise GB model with parameters described by Tsui and Case (gb=1) ( Tsui et al. Biopolymers 2001), (ii) Onufriev, Bashford and Case model, GB(OBC) (gb=2) ( Onufriev et al. Proteins 2004), and (iii) a modified GB(OBC) (gb=5) ( Onufriev et al. Proteins 2004). The surface area term is derived using a fast LCPO algorithm ( Weiser et al. J. Comp Chem 1999).
The AMBER score is calculated as:
E(Complex) - [ E(Receptor) + E(Ligand) ],
where E(Complex), E(Receptor), and E(Ligand) are, respectively, the internal energies of the complex, receptor, and ligand (all solvated) as approximated by the AMBER force field with a GB/SA solvation term as referenced above.The calculation of each of these three energies uses the same protocol: minimization with a conjugate gradient method is followed by MD simulation (Langevin molecular dynamics at constant temperature), another minimization, and a final energy evaluation. The user can specify the number of pre-MD-minimization cycles, the number of MD simulation steps, and the number of post-MD-minimization cycles in the dock input file. During the final energy evaluation, a surface area term is included. The receptor energy is determined once. The AMBER score energy protocol is performed for every ligand and its corresponding complex.
AMBER score enables all or a part of the receptor to be flexible, in order to reproduce the so-called "induced-fit". The Cartesian coordinates of atoms that are flexible can be altered during the AMBER score energy protocol. The Cartesian coordinates of atoms that are not flexible cannot be altered during the AMBER score energy protocol. The flexible parts of the receptor-ligand complex are specified with a movable region dock input file parameter. The movable region options are ligand, everything, nothing, distance, and NAB atom expression. For the ligand option, only the ligand is allowed to move during minimization and MD simulation. For the everything option, all the atoms in the receptor and the ligand are allowed to move. No minimization or MD simulation occurs for the nothing option. This is the only movable region option for which the ligand is not flexible during the AMBER score energy protocol. Consequently, close contacts can produce very large energies because the AMBER force field can be sensitive to the receptor-ligand geometry. However, the nothing option is the fastest type of AMBER scoring.
The distance movable region option selects residues that are allowed to move by receptor-ligand distance. If any atom in a receptor residue is within the dock input file parameter distance cutoff Angstroms of the ligand then the whole residue is selected. The ligand is represented by the active site sphere list, and thus the movable receptor residues are well defined and independent of any particular ligand molecule. The ligand is always movable. The selected residues are emitted with the -v verbose flag as a NAB atom expression.
The NAB atom expression movable region option is based on the program Nucleic Acid Builder (NAB). Every atom in a NAB molecule has a unique name. This name is composed of the strand name, the residue number and the atom name. A NAB atom expression is a character string that contains one or more patterns that match a set of atom names in a molecule. The dock input file parameter associated with this option specifies the set of atoms in the receptor that are movable. All atoms of the ligand are movable. In DOCK a strand name is in fact a strand sequence number, and in a receptor-ligand complex, the ligand is always last in the sequence. Thus, a receptor molecule with two strands has strand names of "1" and "2"; in a corresponding receptor-ligand complex the ligand (which is almost certainly single stranded) has strand name "3".
NAB atom expressions contain three subexpressions separated by colons. They represent the strand, residue and atom parts of the atom expression. Not all three parts are required. An empty part selects all strands, residues or atoms depending on which parts are empty. Each subexpression consists of a comma separated list of patterns, or for the residue part, patterns and/or number ranges. Several atom expressions may be placed in a single character string by separating them with the vertical bar. Patterns in atom expressions are similar to Unix shell expressions. Each pattern is a sequence of one or more single character patterns and/or stars. The star matches zero or more occurrences of any single character. Each part of an atom expression is composed of a comma separated list of limited regular expressions, or in the case of the residue part, limited regular expressions and/or ranges. A range is a number or a pair of numbers separated by a dash. The NAB manual contains more information on atom expressions.
Here are some examples of NAB atom expressions:
- :SER:
#Select all atoms in any residue named SER. All three parts are present but both the strand and atom parts are empty. The atom expression :SER selects the same set of atoms.- ::C,CA,N,O
#Select all atoms with names C, CA, N or O in all residues in all strands (typically the peptide backbone).- 1:1-10,13:CA,C,N
#Select all atoms named CA,C,N in residues 1-10 and 13 in strand 1.- ::C*[^1]
#The [^1] is an example of a negated character class. It matches any character in the last position except 1. In this case, it will match all the atoms starting with C, such as CA, CB, CG2, but not those ending with 1, such as CD1, CE1.- 2::|1:50,100:O*,N*
#Select all atoms in strand 2. Select all atoms whose name starts with O and N in residue 50, 100 in strand 1. Note that the vertical bar separates the two strands- 4::|2::|1::
#Select strand 4, 2 and 1.- :: or :
#Select all atoms in the molecule.AMBER score requires many additional input files beyond those of other DOCK scoring functions. All the input files, such as the prmtop, frcmod, amber.pdb, etc. should be generated prior to running the DOCK AMBER score. Two perl scripts, prepare_amber.pl and prepare_rna_amber.pl, have been provided for this purpose. The script prepare_rna_amber.pl differs from prepare_amber.pl only in that the former treats all nucleic acids as RNA whereas the latter treats them as DNA. The usage of these is
prepare_amber.pl ligand_mol2_file receptor_PDB_file
For example, if lig.mol2 is the name of the file that contains the ligand in mol2 format and rec.pdb is the name of the file that contains the receptor in PDB format then enter: prepare_amber.pl lig.mol2 rec.pdb
These scripts can process a mol2 file containing multiple ligands (usually the output from a previous DOCK run). See the tutorials for more information on how to use these scripts to generate the input files.The prepare amber perl scripts call other scripts and programs, such as antechamber to calculate the AM1-BCC charges for the ligands, and tleap to assign the parm94 parameter set for protein atoms and the GAFF parameter set for ligand atoms. A metal ions library (parameters/leap/cmd/leaprc.dock.ions), a cofactor library (parameters/leap/cmd/leaprc.dock.cofactors), and a hook for a user library (parameters/leap/cmd/leaprc.dock.user), are automatically processed by tleap. The latter library provides an override mechanism for the global handling of common problems such as missing force field parameters. Detection of parmchk "ATTN, needs revision" messages for missing force field parameters and a local recovery mechanism for the frcmod files exists. The recovery process is described in the warning message and involves creating a file with the extension ".frcmod.revised file". With knowledge of and experience using AMBER one could create the input files manually; see the amberize scipts for the gory details. Note that the ligand_atom_file, which has the extension ".amber_score.mol2", must have a TRIPOS AMBER_SCORE_ID section for every ligand.
The AMBER score output in the DOCK output contains energy terms for each species (complex, receptor, and ligand) such that the sum of the terms is equal to the score; in particular, note that the receptor and ligand values have been negated. Use of the DOCK -v verbose flag will produce detailed energy breakdowns; the formats and definitions of this information are discussed in the the NAB manual: Nucleic Acid Builder (NAB). The verbose output is best captured via UNIX shell file redirection as opposed to the DOCK -o flag.
The AMBER score output in the ligand_outfile_prefix_scored.mol2 file contains the ligand coordinates extracted from the final structure of the complex (i.e., the structure after the AMBER score energy protocol is performed). The ligand charges are from the ligand prmtop file. The ligand atom types are the normal ones from DOCK and are not the GAFF atom types used by AMBER score.
The AMBER score uses the following method for emitting all the final structures in their entirety. The AMBER score will create a file with the extension ".final_pose.amber.pdb" for every species (receptor, ligands, and complexes) that contains any movable region (regardless of whether that region actually exists or actually does move). Thus, for the distance, everything, and NAB atom expression options of the movable region, a file for the receptor (amber_score_receptor_file_prefix.final_pose.amber.pdb) and two for each ligand,complex pair (AMBER_SCORE_ID.final_pose.amber.pdb, amber_score_receptor_file_prefix.AMBER_SCORE_ID.final_pose.amber.pdb) will be created; for the nothing option of the movable region, no files will be created. In addition, if the -v verbose flag is present then the AMBER score will create an AMBER restart file with the extension ".final_pose.amber.restart" for every species (receptor, ligands, and complexes) that contains any movable region (regardless of whether that region actually exists or actually does move).
Here is some general advice on the AMBER score. Since it is slower and more complicated than the other scoring functions, one should proceed carefully when using the AMBER score as the sole primary score. The most obvious use is to rescore, with the default amber_score input parameters, the ligands that have been already scored using one of the faster scoring functions. In general, the input preparation for AMBER score is more involved than for the other scores. Effectively, one needs to generate input files for AMBER. In particular, examine the amberize*.out and *.log files for warnings and errors. Experience using AMBER will help in understanding and judging the impact of the messages in these files. Correcting problems may involve substantial effort. See the DOCK Fans mailing list for specific examples.
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
AMBER Score Parameters
- amber_score_primary [no] (yes, no):
#Flag to perform amber scoring as the primary scoring function.- amber_score_secondary [no] (yes, no):
#Flag to perform amber scoring as the secondary scoring function. This is temporarily deprecated, and using input parameter amber_score_secondary causes program termination. The recommended protocol is to perform two DOCK runs with the second run specifying amber_score as the primary_score.
- amber_score_receptor_file_prefix[rec] (string):
#Prefix of the Receptor. Use the prefix that was used in the prepare_amber.pl input file preparation step.- amber_score_movable_region [ligand] (distance, everything, ligand, nab_atom_expression, nothing):
#The region that will be flexible during the scoring protocol.
- amber_score_movable_distance_cutoff [3.0] (float):
#All receptor residues within this cutoff in Angstroms of the ligand will be movable. This is active only for amber_score_movable_region=distance.- receptor_site_file [receptor.sph] (string):
#The file containing the receptor spheres that define the active site. This is not specific to amber_score. This is active for amber_score_movable_region=distance.- amber_score_movable_atom_expression [::] (string):
#NAB atom expression defining the movable receptor region. This is active only for amber_score_movable_region=nab_atom_expression.- amber_score_minimization_rmsgrad [0.01] (float):
#Minimization convergence criterion for the root-mean-square of the components of the gradient.- amber_score_before_md_minimization_cycles [100] (int):
#Number of conjugate gradient minimization cycles to be performed before MD.- amber_score_md_steps [3000] (int):
#Number of Molecular Dynamics (MD) steps to be performed.- amber_score_after_md_minimization_cycles [100] (int):
#Number of conjugate gradient minimization cycles to be performed after MD.- amber_score_gb_model [5] (int):
#GB model to be used.- amber_score_nonbonded_cutoff [18.0] (float):
#Non-bonded cutoff in Angstroms for the energy calculation.- amber_score_temperature [300.0] (float):
#Temperature at which MD should be performed.- amber_score_abort_on_unprepped_ligand [yes] (yes, no):
#Control over the behavior for an unprepped ligand.
Score optimization allows the orientation and conformation of a molecule to be adjusted to improve the score. Although the calculations can be expensive, the orientational and conformational searches may become more efficient because less sampling is necessary. Optimization is activated with the
minimize_ligand parameter. The optimizer currently uses the downhill simplex algorithm, which does not require evaluation of derivatives (Nelder et al. Computer Journal, 1964). However, it does depend on a random number generator which can be sensitive to the initial seed provided with the random_seed parameter. The amount of variance should be small, though. For detailed calculations, it is recommended that the optimization be repeated with different random number seeds to check convergence.The initial step size of the minimizer is specified with the initial_translation, initial_rotation, and
initial_torsion parameters. Users can choose to minimize the rigid anchors, minimize during flexible growth, and minimize the final conformation. The anchor minimization is always done rigidly; also, if no flexible growth is being done, this step will minimize the entire molecule. The minimization during the flexible growth is a complete (torsions + rigid) minimization.The _max_iterations input parameter specifies the convergence criterion in number of iterations inside one call of the simplex minimizer; the _score_converge input parameter specifies the other convergence criterion in terms of the score inside one call of the simplex minimizer. Thus, when the simplex shrinks enough so that the highest and lowest points are within the scoring energy tolerance or when the maximum number of iterations is reached, one call to the simplex minimizer terminates. Because of convergence issues with the downhill simplex method, additional calls of the simplex minimizer inside one step of minimization are possible: these convergence criteria are _max_cycles for the maximum number of calls, i.e., cycles, of the simplex minimizer and _cycle_converge for the normalized vector distance that the simplex has moved. For further details see page 416 of Ewing et al. 2001 and note that the citation numbers are off by one: i.e., [24] should be [23].
NOTE: The following parameter definitions will use the format below:
parameter_name [default] (value):
#descriptionIn some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.
Score Optimization Parameters
- minimize_ligand [yes] (yes, no):
#Flag to perform score optimization via ligand minimization(if flexible_ligand = no)
simplex_max_iterations [50] (int):
#Maximum number of iterations per cycle of simplex minimization(if flexible_ligand = yes)
- minimize_anchor [yes] (yes, no):
# Flag to perform rigid optimization of the anchor
- simplex_anchor_max_iterations [500] (int):
#Maximum number of iterations per cycle per anchor- minimize_flexible_growth [yes] (yes, no):
# Flag to perform complete optimization during conformational search
- simplex_grow_max_iterations [500] (int):
#Maximum number of iterations per cycle per growth step- use_adva