Irwin D. Kuntz
Demetri
T. Moustakas
P. Therese Lang
© University of California 2006
Last updated March 2006
General
Overview
Currently, only MOL2 file I/O is supported. Ligands are read in
from a single MOL2 or multi-MOL2 file.
Atom and bond types are assigned using the DOCK 4 atom/bond typing
parameter files (vdw.defn, flex.defn,
flex_table.defn).
More information about all of these files can be found in the
Appendix. There are several ligand output options, which write molecules to files whose
names are formed using the output_file_prefix
parameter:
DOCK
will always write out a scored molecules output file, which contains the best
scoring pose for each molecule in the database.
This will create a file called outputprefix_scored.mol2. Beyond this option, there are several other
levels of sampling output:
1) Users can choose to write out
orientations. This will create a file
called outputprefix_orients.mol2. This
will write out the molecules after they have been rigidly oriented and
optimized. If anchor & grow is being
used, this option will write out only the anchor fragment. All orientations generated will be written out,
so be careful that the output does not get too huge.
2) Users can also write out conformers prior to
final optimization. This will create a
file called outputprefix_confs.mol2.
Again, be aware that the number of molecules in the output file will be
equal to the database size * the # of anchors per molecule * the number of
orientations per anchor * the number of conformers per cycle. This file can grow quite large, so only use
it on single poses or small databases.
3)
Finally, users can write molecules ranked by score. This will create a file called
outputprefix_ranked.mol2, which writes out the top N molecules from the
database. This option disables the
scored molecule output file by default, though users can override this and
write out the best pose for each molecule as well.
DOCK 5 uses receptor spheres and ligand
heavy atom centers to rigidly orient ligands in the
receptor. The spheres are generated
using the accessory SPHGEN. Cliques of
receptor spheres & ligand centers are identified
using the maximum subgraph clique detection algorithm
from DOCK 4. All cliques that satisfy
the matching parameters are generated in the matching step, and can be sorted
or ordered prior to the loop where the program cycles through the orientations.
Both automated and manual matching are
available in DOCK5. The sphere/center
matches are determined by 2 parameters:
1) The distance tolerance is the tolerance in angstroms
within which a pair of spheres is considered equivalent to a pair of centers
2) The distance minimum is the shortest distance allowed
between 2 spheres (any sphere pair with a shorter distance is disregarded)
Manual
matching will create as many matches as possible given the specified
parameters, and sort the matches according to the RMS error between the spheres
and centers in the match. The matches
are provided as orientations until either the max_orients
# of orientations are reached, or the end of the match list is reached.
Automated matching will start with the default values for
the distance tolerance and distance minimum.
A list of matches will be generated, and if the # of matches is less
than the # max_orientations, then the distance
tolerance is increased and the matching is repeated until there are at least max_orientations in the match list. Then the list is sorted, and orientations are
generated.
Ligand flexibility in DOCK 5
uses the anchor-and-grow algorithm, which was introduced in DOCK 4. Rotatable bonds
(not contained in rings) are used to partition the molecule into rigid
segments, from which all anchors that meet the criteria are selected beginning
with the largest anchor segment. All
anchor orientations (or the starting orientation only, if no orienting is
selected) are used as starting configurations onto which the first flexible
layer is appended and conformationally expanded. The total population of conformers is then
reduced to the number specified by num_confs_per_cycle,
and the process is repeated until the last layer is reached.
The conformer generator class now integrates score
optimization in the anchor & grow algorithm. The anchors can be rigidly optimized, the
final conformations can be rigidly, torsionally, or
completely optimized, and the partially grown conformers can be completely
optimized. The anchor & grow steps
use whichever scoring function the user selects as the primary scoring
function. The final minimization step
uses the secondary scoring function.
This release of DOCK5 implements a hierarchical scoring
function strategy. A master score class
manages all scoring functions that DOCK uses.
Any of the DOCK scoring functions can be selected as the primary and/or
the secondary scoring function. The
primary scoring function is used during the rigid minimization,
and anchor & grow steps, which typically make many calls to the scoring
function. The secondary scoring function
is used in the final minimization, scoring, and ranking of the molecules. If no secondary scoring function is selected,
the primary scoring function is used as the secondary.
This release contains continuous molecular mechanics
based scoring (vdw + columbic terms only), grid-based
molecular mechanics scoring, contact scoring and bump filtering as implemented
in DOCK 4. Scoring grids are created
using the GRID program. DOCK also
contains GB/SA scoring, as implemented in SDOCK. Scoring grids for the GBSA code are
calculated using the accessories nchemgrid_GB and nchemgrid_SA.
This release also includes an internal energy scoring
function, which is used during the anchor & grow flexible search. This function computes the Lennard-Jones and columbic energy between all ligand atom pairs, excluding all 1-2, 1-3, and 1-4
pairs. This energy is not included in
the final reported score.
Score optimization is implemented using a simplex minimizer based on the DOCK 4 minimizer. Users can choose to minimize the rigid
anchors, minimize during flexible growth, and minimize the final
conformation. The anchor minimization is
always done rigidly; also, if no flexible growth is being done, this step will
minimize the entire molecule. The
minimization during the flexible growth is a complete (torsions + rigid)
minimization. The final minimization can
be rigid or torsions only, or complete.
When the simplex “shrinks” enough so that the highest and lowest points
are within the scoring tolerance or if the number of requested minimizer steps is reached, the minimizer
terminates.
Using DOCK
1.
Save file for appropriate operating system to hard drive.
2.
Uncompress the archive into a folder called dock5/ in a
directory of your choice.
a.
For windows systems, a Zip file is provided
b.
For *nix systems, a gzipped archive is
provided
3.
All DOCK 5 binaries are installed in dock5/bin/
The dock5 directory contains
the following subdirectories:
bin/
demo/
installation/
parameters/
src/
utilities/
accessories/
grid/
GBSA_Grids/
Compiling DOCK (if required)
DOCK comes with platform
specific compiled binaries. You should
not need to compile the code or accessories unless you have made changes to the
source code, or are planning to run DOCK on a platform for which we do not
distribute binaries.
Building
DOCK: (all platforms)
From the dock5 directory:
cd config/
./configure
gnu
make
DOCK with mpi function is built upon an mpi
library. The MPICH library is provided
freely by Argonne National Labs (http://www-unix.mcs.anl.gov/mpi/mpich/). The MPI library needs to be installed and
running on the system if the MPI features are to be used. Once MPI is installed, you need to define
MPICH_HOME as an environment variable.
Building
MPI-DOCK (all
platforms):
From the dock5 directory:
cd config/
./configure
gnu.parallel
make
NOTE: MPI-DOCK 5.4 has been compiled with
MPICH-1.2.7 on all supported platforms (MPICH-1.2.5 for WinXP).
For Windows Users:
DOCK and its accessories must be run using a Linux-like environment like Cygwin (http://www.cygwin.com/). When you install your emulator, make sure to also install compilers and unix shells (“Devel” for Cygwin).
DOCK must be run command
line from a standard unix
shell. It reads a parameter file
containing field/value pairs using the following command:
dock5/bin/dock5 -i parameter.in [-v1]
[-v2] [-o outputfile.txt]
If the parameter file does
not exist, DOCK will generate one using your responses to the parameter
questions. If the parameter file exists,
any parameter values found will be read.
DOCK 5 outputs the job
parameters to the screen at the start of the job, and prints summary
information for each molecule processed.
Additional summary information will be included in future releases. The –v1 flag prints a histogram of sphere
matching information. The –v2 flag
prints details about the breakdown of the GB/SA terms.
Running DOCK in Parallel
If you have installed the
MPI library, DOCK can be run in parallel using the following command:
mpirun –np # dock5.mpi -i parameter.in –o outputfile.txt
Note that that parallezation is set up to have a single Master node with
the remaining nodes act as slaves. The
Master node performs file processing and input/output, whereas the slaves
perform the actual calculations. If –np = 1, the code defaults to non-MPI behavior. As a result, there will be minimal difference
in performance between 1 and 2 processors.
Improved performance will only become evident with more than 2 nodes.
Running the Demo
DOCK 5.4 includes two demonstration
files that are designed to test your installation. These demos must also be run command-line.
For
DOCK: (all platforms)
From the dock5 directory:
cd demo
./script_clean
./script_demo
For MPI-DOCK: (all platforms)
From the dock5 directory:
cd demo
./script_clean
./script_mpi_demo
NOTE: MPI-DOCK
will be run on 4 processors for the demo
The parameters for several
common calculations have been optimized using test sets. General recommendations for these parameters
can be found in dock5/recommended_input.
Below, all available options for DOCK are described in detail.
The DOCK 5 parameter parser
requires that the values entered for a parameter exactly match one of the legal
values if any legal values are specified.
For example:
param_a [5] ():
param_b [5] (0 5 10):
param_a can be
assigned any value, however param_b can only be
assigned 0, 5, or 10. If no value is
entered, both will default to a value of 5.
Below are listed all DOCK 5 parameters, their default values, legal
values, and a brief description of each.
The parameters are listed in order of function. Also, for questions requiring a yes/no
answer, please use the full word (yes or no) as opposed to y or n.
|
Parameter Name |
Default |
Values |
Description |
|
ligand_atom_file |
database.mol2 |
string |
The ligand input filename |
|
ligand_outfile_prefix |
output |
string |
The
prefix that all output files will use |
|
limit_max_ligands |
no |
bool (yes, no) |
The
maximum # of ligands that will be read in from a
library |
|
write_orientations |
no |
bool (yes, no) |
Flag to
write orientations |
|
write_conformations |
no |
yes, no |
Flag to
write conformations |
|
initial_skip |
0 |
int |
The # of
molecules to skip over at the beginning of a library |
|
calculate_rmsd |
no |
yes, no |
Flag to
perform an RMSD calculation between the final molecule pose and its initial
structure. |
|
use_rmsd_reference_mol |
no |
yes, no |
Specify reference structure for RMSD calculation (default is starting structure) |
|
rmsd_reference_filename |
ligand_rmsd.mol2 |
string |
File containing RMSD reference structure |
|
rank_ligands |
no |
yes, no |
Flag to
enable a ligand top-score list. These ligands
will be written to outfile_ranked.mol2, and outfile_scored.mol2 will be empty
by default |
|
max_ranked_ligands |
500 |
int |
The # of ligands to be stored in the top score list |
|
scored_conformer_output_override |
no |
yes, no |
This flag
causes all ligands to be written to
outfile_scored.mol2, even when rank_ligands is true |
|
num_scored_conformers_written |
1 |
int |
The # of
scored poses for each ligand printed to
output_scored.mol2 |
|
cluster_conformations |
yes |
yes, no |
Flag to
enable clustering of fully minimized conformations |
|
cluster_rmsd_threshold |
2.0 |
float |
The
cutoff to determine whether conformations should be clustered |
|
Parameter Name |
Default |
Values |
Description |
|
orient_ligand |
yes |
bool (yes, no) |
Flag to
orient ligand to spheres |
|
automated_matching |
yes |
bool (yes, no) |
Flag to
perform automated matching instead of manual matching |
|
distance_tolerence |
0.25 |
float |
The
distance tolerance applied to each edge in a clique |
|
distance_minimum |
2.0 |
float |
The
minimum size for an edge in a clique |
|
nodes_minimum |
3 |
int |
The
minimum # of nodes in a clique |
|
nodes_maximum |
10 |
int |
The
maximum # of nodes in a clique |
|
receptor_site_file |
receptor.sph |
string |
The file
containing the receptor spheres |
|
max_orientations |
500 |
int |
The
maximum # of orientations that will be cycled through |
|
critical_points |
no |
bool (yes, no) |
Flag to use
critical point sphere labeling to target orientations to particular spheres |
|
chemical_matching |
no |
bool (yes, no) |
Flag to
use chemical “coloring” of spheres to match chemical labels on ligand atoms |
|
chem_match_tbl |
chem_match.tbl |
string |
File
defining the legal chemical type matches/pairings |
|
use_ligand_spheres |
no |
bool (yes, no) |
Flag to
enable a sphere file representing ligand heavy
atoms to be used to orient the ligand. Typically used for macromolecular docking |
|
ligand_sphere_file |
ligand.sph |
string |
Ligand spheres |
|
Parameter Name |
Default |
Values |
Description |
|
flexible_ligand |
yes |
bool (yes, no) |
Flag to
perform ligand conformational searching |
|
ag_conf_search |
yes |
bool (yes, no) |
Flag to
use the anchor & grow algorithm to search ligand
conformations |
|
min_anchor_size |
40 |
int |
The
minimum # of heavy atoms for an anchor segment |
|
num_anchor_orients_for_growth |
100 |
int |
The
maximum number of anchor orientations promoted to the conformational search |
|
number_confs_for_next_growth |
100 |
int |
The
maximum number of conformations carried forward in the anchor & grow
search |
|
use_internal_energy |
yes |
bool (yes, no) |
Flag to add
an internal energy term to the score during the conformational search |
|
internal_energy_att_exp |
6 |
int |
VDW
attractive exponent |
|
internal_energy_rep_exp |
12 |
int |
VDW
repulsive exponent |
|
internal_energy_dielectric |
4.0 |
float |
Dielectric
used for electrostatic calculation |
|
use_clash_overlap |
no |
bool (yes, no) |
Flag to
check for overlapping atom volumes during anchor and grow |
|
clash_overlap |
0.5 |
float |
Percent
of overlap allowed before a clash is declared |
|
Parameter Name |
Default |
Values |
Description |
|
bump_filter |
yes |
bool (yes, no) |
Flag to
perform bump filtering |
|
bump_grid_prefix |
grid |