DOCK 5.4 User Manual

 

 

 

Irwin D. Kuntz

Demetri T. Moustakas

P. Therese Lang

  

© University of California 2006

Last updated March 2006
 

General Overview

 

Ligand File I/O

 

            Currently, only MOL2 file I/O is supported.  Ligands are read in from a single MOL2 or multi-MOL2 file.  Atom and bond types are assigned using the DOCK 4 atom/bond typing parameter files (vdw.defn, flex.defn, flex_table.defn).  More information about all of these files can be found in the Appendix.  There are several ligand output options, which write molecules to files whose names are formed using the output_file_prefix parameter:

 

DOCK will always write out a scored molecules output file, which contains the best scoring pose for each molecule in the database.  This will create a file called outputprefix_scored.mol2.  Beyond this option, there are several other levels of sampling output:

1)  Users can choose to write out orientations.  This will create a file called outputprefix_orients.mol2.  This will write out the molecules after they have been rigidly oriented and optimized.  If anchor & grow is being used, this option will write out only the anchor fragment.  All orientations generated will be written out, so be careful that the output does not get too huge.

2)  Users can also write out conformers prior to final optimization.  This will create a file called outputprefix_confs.mol2.  Again, be aware that the number of molecules in the output file will be equal to the database size * the # of anchors per molecule * the number of orientations per anchor * the number of conformers per cycle.  This file can grow quite large, so only use it on single poses or small databases.

3) Finally, users can write molecules ranked by score.  This will create a file called outputprefix_ranked.mol2, which writes out the top N molecules from the database.  This option disables the scored molecule output file by default, though users can override this and write out the best pose for each molecule as well.

 

Rigid Orienting

 

            DOCK 5 uses receptor spheres and ligand heavy atom centers to rigidly orient ligands in the receptor.  The spheres are generated using the accessory SPHGEN.  Cliques of receptor spheres & ligand centers are identified using the maximum subgraph clique detection algorithm from DOCK 4.  All cliques that satisfy the matching parameters are generated in the matching step, and can be sorted or ordered prior to the loop where the program cycles through the orientations. 

 

            Both automated and manual matching are available in DOCK5.  The sphere/center matches are determined by 2 parameters:

 

1)     The distance tolerance is the tolerance in angstroms within which a pair of spheres is considered equivalent to a pair of centers

2)     The distance minimum is the shortest distance allowed between 2 spheres (any sphere pair with a shorter distance is disregarded)

 

Manual matching will create as many matches as possible given the specified parameters, and sort the matches according to the RMS error between the spheres and centers in the match.  The matches are provided as orientations until either the max_orients # of orientations are reached, or the end of the match list is reached.

 

            Automated matching will start with the default values for the distance tolerance and distance minimum.  A list of matches will be generated, and if the # of matches is less than the # max_orientations, then the distance tolerance is increased and the matching is repeated until there are at least max_orientations in the match list.  Then the list is sorted, and orientations are generated.

 

Ligand Flexibility

 

            Ligand flexibility in DOCK 5 uses the anchor-and-grow algorithm, which was introduced in DOCK 4.  Rotatable bonds (not contained in rings) are used to partition the molecule into rigid segments, from which all anchors that meet the criteria are selected beginning with the largest anchor segment.  All anchor orientations (or the starting orientation only, if no orienting is selected) are used as starting configurations onto which the first flexible layer is appended and conformationally expanded.  The total population of conformers is then reduced to the number specified by num_confs_per_cycle, and the process is repeated until the last layer is reached.

 

            The conformer generator class now integrates score optimization in the anchor & grow algorithm.  The anchors can be rigidly optimized, the final conformations can be rigidly, torsionally, or completely optimized, and the partially grown conformers can be completely optimized.  The anchor & grow steps use whichever scoring function the user selects as the primary scoring function.  The final minimization step uses the secondary scoring function.

 

Scoring Functions

 

            This release of DOCK5 implements a hierarchical scoring function strategy.  A master score class manages all scoring functions that DOCK uses.  Any of the DOCK scoring functions can be selected as the primary and/or the secondary scoring function.  The primary scoring function is used during the rigid minimization, and anchor & grow steps, which typically make many calls to the scoring function.  The secondary scoring function is used in the final minimization, scoring, and ranking of the molecules.  If no secondary scoring function is selected, the primary scoring function is used as the secondary.

 

            This release contains continuous molecular mechanics based scoring (vdw + columbic terms only), grid-based molecular mechanics scoring, contact scoring and bump filtering as implemented in DOCK 4.  Scoring grids are created using the GRID program.  DOCK also contains GB/SA scoring, as implemented in SDOCK.    Scoring grids for the GBSA code are calculated using the accessories nchemgrid_GB and nchemgrid_SA. 

 

            This release also includes an internal energy scoring function, which is used during the anchor & grow flexible search.  This function computes the Lennard-Jones and columbic energy between all ligand atom pairs, excluding all 1-2, 1-3, and 1-4 pairs.  This energy is not included in the final reported score.

 

Score Optimization

 

            Score optimization is implemented using a simplex minimizer based on the DOCK 4 minimizer.  Users can choose to minimize the rigid anchors, minimize during flexible growth, and minimize the final conformation.  The anchor minimization is always done rigidly; also, if no flexible growth is being done, this step will minimize the entire molecule.  The minimization during the flexible growth is a complete (torsions + rigid) minimization.  The final minimization can be rigid or torsions only, or complete.  When the simplex “shrinks” enough so that the highest and lowest points are within the scoring tolerance or if the number of requested minimizer steps is reached, the minimizer terminates.

 


Using DOCK

 

Installing DOCK

 

1.      Save file for appropriate operating system to hard drive.

2.      Uncompress the archive into a folder called dock5/ in a directory of your choice.

a.      For windows systems, a Zip file is provided

b.      For *nix systems, a gzipped archive is provided

3.      All DOCK 5 binaries are installed in dock5/bin/

 

The dock5 directory contains the following subdirectories:

 

           

bin/

demo/

installation/

parameters/

src/

utilities/

            accessories/

            grid/

                        GBSA_Grids/

 

 

Compiling DOCK (if required)

 

DOCK comes with platform specific compiled binaries.  You should not need to compile the code or accessories unless you have made changes to the source code, or are planning to run DOCK on a platform for which we do not distribute binaries.

 

Building DOCK: (all platforms)

            From the dock5 directory:

            cd config/

            ./configure gnu

            make

 

DOCK with mpi function is built upon an mpi library.  The MPICH library is provided freely by Argonne National Labs (http://www-unix.mcs.anl.gov/mpi/mpich/).  The MPI library needs to be installed and running on the system if the MPI features are to be used.  Once MPI is installed, you need to define MPICH_HOME as an environment variable.

 

Building MPI-DOCK (all platforms):

            From the dock5 directory:

            cd config/

./configure gnu.parallel

make

 

NOTE:  MPI-DOCK 5.4 has been compiled with MPICH-1.2.7 on all supported platforms (MPICH-1.2.5 for WinXP).

 

  

Running DOCK

 

For Windows Users:

DOCK and its accessories must be run using a Linux-like environment like Cygwin (http://www.cygwin.com/).  When you install your emulator, make sure to also install compilers and unix shells (“Devel” for Cygwin).

 

DOCK must be run command line from a standard unix shell.  It reads a parameter file containing field/value pairs using the following command:

 

            dock5/bin/dock5  -i   parameter.in  [-v1]   [-v2] [-o outputfile.txt]

 

If the parameter file does not exist, DOCK will generate one using your responses to the parameter questions.  If the parameter file exists, any parameter values found will be read. 

 

DOCK 5 outputs the job parameters to the screen at the start of the job, and prints summary information for each molecule processed.  Additional summary information will be included in future releases.  The –v1 flag prints a histogram of sphere matching information.  The –v2 flag prints details about the breakdown of the GB/SA terms.

 

Running DOCK in Parallel

 

If you have installed the MPI library, DOCK can be run in parallel using the following command:

 

mpirunnp # dock5.mpi -i parameter.in –o outputfile.txt

 

Note that that parallezation is set up to have a single Master node with the remaining nodes act as slaves.  The Master node performs file processing and input/output, whereas the slaves perform the actual calculations.  If –np = 1, the code defaults to non-MPI behavior.  As a result, there will be minimal difference in performance between 1 and 2 processors.  Improved performance will only become evident with more than 2 nodes.

 

Running the Demo

 

DOCK 5.4 includes two demonstration files that are designed to test your installation.  These demos must also be run command-line.

                       

For DOCK: (all platforms)

            From the dock5 directory:

            cd demo

            ./script_clean

            ./script_demo

 

For MPI-DOCK:  (all platforms)

            From the dock5 directory:

            cd demo

            ./script_clean

            ./script_mpi_demo

NOTE:  MPI-DOCK will be run on 4 processors for the demo

 

DOCK 5 Parameters

 

The parameters for several common calculations have been optimized using test sets.  General recommendations for these parameters can be found in dock5/recommended_input.  Below, all available options for DOCK are described in detail. 

 

The DOCK 5 parameter parser requires that the values entered for a parameter exactly match one of the legal values if any legal values are specified.  For example:

 

param_a                 [5] ():

            param_b                 [5] (0 5 10):

 

param_a can be assigned any value, however param_b can only be assigned 0, 5, or 10.  If no value is entered, both will default to a value of 5.  Below are listed all DOCK 5 parameters, their default values, legal values, and a brief description of each.  The parameters are listed in order of function.  Also, for questions requiring a yes/no answer, please use the full word (yes or no) as opposed to y or n.

 

Ligand I/O Parameters

Parameter Name

Default

Values

Description

ligand_atom_file

database.mol2

string

The ligand input filename

ligand_outfile_prefix

output

string

The prefix that all output files will use

limit_max_ligands

no

bool (yes, no)

The maximum # of ligands that will be read in from a library

write_orientations

no

bool (yes, no)

Flag to write orientations

 

write_conformations

no

yes, no

Flag to write conformations

 

initial_skip

0

int

The # of molecules to skip over at the beginning of a library

calculate_rmsd

no

yes, no

Flag to perform an RMSD calculation between the final molecule pose and its initial structure.

use_rmsd_reference_mol

no

yes, no

Specify reference structure for RMSD calculation (default is starting structure)

 

rmsd_reference_filename

ligand_rmsd.mol2

string

File containing RMSD reference structure

rank_ligands

no

yes, no

Flag to enable a ligand top-score list.  These ligands will be written to outfile_ranked.mol2, and outfile_scored.mol2 will be empty by default

max_ranked_ligands

500

int

The # of ligands to be stored in the top score list

scored_conformer_output_override

no

yes, no

This flag causes all ligands to be written to outfile_scored.mol2, even when rank_ligands is true

num_scored_conformers_written

1

int

The # of scored poses for each ligand printed to output_scored.mol2

cluster_conformations

yes

yes, no

Flag to enable clustering of fully minimized conformations (NOTE: Only available if num_scored_confomers_written > 1)

cluster_rmsd_threshold

2.0

float

The cutoff to determine whether conformations should be clustered

 

Orient Ligand Parameters

Parameter Name

Default

Values

Description

orient_ligand

yes

bool (yes, no)

Flag to orient ligand to spheres

automated_matching

yes

bool (yes, no)

Flag to perform automated matching instead of manual matching

distance_tolerence

0.25

 float

The distance tolerance applied to each edge in a clique

distance_minimum

2.0

 float

The minimum size for an edge in a clique

nodes_minimum

3

 int

The minimum # of nodes in a clique

nodes_maximum

10

 int

The maximum # of nodes in a clique

receptor_site_file

receptor.sph

string

The file containing the receptor spheres

max_orientations

500

 int

The maximum # of orientations that will be cycled through

critical_points

no

bool (yes, no)

Flag to use critical point sphere labeling to target orientations to particular spheres

chemical_matching

no

bool (yes, no)

Flag to use chemical “coloring” of spheres to match chemical labels on ligand atoms

chem_match_tbl

chem_match.tbl

string

File defining the legal chemical type matches/pairings

use_ligand_spheres

no

bool (yes, no)

Flag to enable a sphere file representing ligand heavy atoms to be used to orient the ligand.  Typically used for macromolecular docking

ligand_sphere_file

ligand.sph

string

Ligand spheres

 

 

Flexible Ligand Parameters

Parameter Name

Default

Values

Description

flexible_ligand

yes

bool (yes, no)

Flag to perform ligand conformational searching

ag_conf_search

yes

bool (yes, no)

Flag to use the anchor & grow algorithm to search ligand conformations

min_anchor_size

40

 int

The minimum # of heavy atoms for an anchor segment

num_anchor_orients_for_growth

100

 int

The maximum number of anchor orientations promoted to the conformational search

number_confs_for_next_growth

100

 int

The maximum number of conformations carried forward in the anchor & grow search

use_internal_energy

yes

bool (yes, no)

Flag to add an internal energy term to the score during the conformational search

internal_energy_att_exp

6

int

VDW attractive exponent

 

internal_energy_rep_exp

12

int

VDW repulsive exponent

 

internal_energy_dielectric

4.0

float

Dielectric used for electrostatic calculation

use_clash_overlap            

no

bool (yes, no)

Flag to check for overlapping atom volumes during anchor and grow

clash_overlap

0.5

float

Percent of overlap allowed before a clash is declared

 

Ligand Scoring Parameters

Parameter Name

Default

Values

Description

bump_filter

yes

bool (yes, no)

Flag to perform bump filtering

bump_grid_prefix

grid