DOCK 6.6 Users Manual

 

Authors:

P. Therese Lang (UCB)
Demetri Moustakas (Harvard)

Scott Brozell (Rutgers University)
Noel Carrascal (SUNY-Stony Brook)
Sudipto Mukherjee (SUNY-Stony Brook)
Trent Balius (SUNY-Stony Brook)
William Joseph Allen (SUNY-Stony Brook)
Patrick Holden (SUNY-Stony Brook)
Scott Pegg (UCSF)
Kaushik Raha (UCSF)
Devleena Shivakumar (previously at The Scripps Research Institute)

Robert Rizzo (SUNY-Stony Brook)
David Case (Rutgers University)
Brian Shoichet (UCSF)
Irwin Kuntz (UCSF)

 

 

Copyright © 2006-2014
Regents of the University of California
All Rights Reserved

Last updated July 11, 2014


Table of Contents

1. Introduction

1.1. General Overview

1.2. What Can DOCK Do for You

1.3. Installation

1.4. What's New in DOCK 6

1.5. Overview of the DOCK Suite of Programs

2. DOCK

2.1. Overview

2.2. History

2.3. Command-line Arguments

2.4. The Parameter Parser

2.5. Ligand File Input

2.5.1. Ligand RMSD

2.6. Database Filter

2.7. Orienting the Ligand

2.7.1. Sphere Matching

2.7.2. Critical Points

2.7.3. Chemical Matching

2.7.4. Macromolecular Docking

2.8. Internal Energy Calculation

2.9. Ligand Flexibility

2.9.1. Anchor-and-Grow

2.9.2. Identification of Rigid Segments

2.9.3. Manual Specification of Non-rotatable Bonds

2.9.4. Identification of Flexible Layers

2.9.5. Pruning the Conformation Search Tree

2.9.6 Time Requirements

2.9.7 Growth Tree and Statistics

2.10. Scoring

2.10.1. Bump Filter

2.10.2. Contact Score

2.10.3. Grid-Based Score

2.10.4. DOCK 3.5 Score

2.10.5. Continuous Score

2.10.6. Zou GB/SA Score

2.10.7. Hawkins GB/SA Score

2.10.8. PB/SA Score

2.10.9. AMBER Score

2.10.9.1. AMBER Score Binding Energy

2.10.9.2. AMBER Score Receptor Flexibility

2.10.9.3. AMBER Score Inputs

2.10.9.4. AMBER Score Outputs

2.10.9.5. AMBER Score in Practice

2.10.9.6. AMBER Score Parameters

2.10.10. Descriptor Score

2.10.11. MultiGrid FPS Score

2.10.12. SASA Score

2.11. Minimization

2.12. Parameter Files

2.12.1. Atom Definition Rules

2.12.2. vdw.defn

2.12.3. chem.defn

2.12.4. chem_match.tbl

2.12.5. flex.defn

2.12.7. flex_drive.tbl

2.13. Ligand File Output

2.14. Parallel Processing

3. Accessories

3.1. Grid

3.1.1. Overview

3.1.2. Bump Checking

3.1.3. Contact Scoring

3.1.4. Energy Scoring

3.2. Docktools

3.2.1. Chemgrid

3.2.2. Ligand Desolvation

3.2.3. Occupancy Desolvation

3.2.4. Grid Conversion

3.3. Nchemgrids

3.4. Sphgen

3.4.1. Overview

3.4.2. Critical Points

3.4.3. Chemical Matching

3.4.4. Output

3.5. Showbox

3.6. Showsphere

3.7. Sphere Selector

3.8. Antechamber

3.9. tLEaP

3.10. Amber Score Preparation Scripts

4. Molecular File Formats

4.1. Tripos MOL2 Format

4.2. PDB Format

5. References

6. Acknowledgments


Introduction

RETURN TO TABLE OF CONTENTS

1.1. General Overview

DOCK addresses the problem of "docking" molecules to each other. In general, "docking" is the identification of the low-energy binding modes of a small molecule, or ligand, within the active site of a macromolecule, or receptor, whose structure is known. A compound that interacts strongly with, or binds, a receptor associated with a disease may inhibit its function and thus act as a drug. Solving the docking problem computationally requires an accurate representation of the molecular energetics as well as an efficient algorithm to search the potential binding modes.

Historically, the DOCK algorithm addressed rigid body docking using a geometric matching algorithm to superimpose the ligand onto a negative image of the binding pocket. Important features that improved the algorithm's ability to find the lowest-energy binding mode, including force-field based scoring, on-the-fly optimization, an improved matching algorithm for rigid body docking and an algorithm for flexible ligand docking, have been added over the years. For more information on past versions of DOCK, click here.

With the release of DOCK 6, we continue to improve the algorithm's ability to predict binding poses by adding new features like force-field scoring enhanced by solvation and receptor flexibility. For more information about the current release of DOCK, click here.

RETURN TO TABLE OF CONTENTS

1.2. What Can DOCK Do for You

We and others have used DOCK for the following applications:

  • predict binding modes of small molecule-protein complexes
  • search databases of ligands for compounds that inhibit enzyme activity
  • search databases of ligands for compounds that bind a particular protein
  • search databases of ligands for compounds that bind nucleic acid targets
  • examine possible binding orientations of protein-protein and protein-DNA complexes
  • help guide synthetic efforts by examining small molecules that are computationally derivatized
  • many more...

RETURN TO TABLE OF CONTENTS

1.3. Installation

DOCK is Unix based scientific software and follows a common installation recipe: download, unpack, configure, build, and test. The simple configuration scheme of DOCK is based on plain text files. Building and testing employ the make command. DOCK installation is so simple and transparent that users have a reasonable chance of correcting problems themselves.

Start with a plain serial installation. Follow the detailed steps (1. through 5.) enumerated below. The appropriate configuration option is likely gnu; see step 3. Subsequently, additional executables can be installed for parallel, pbsa, etc; see step 6. (Here is the quick start for an example gnu parallel installation: cd install; ./configure gnu; make install; make dockclean; ./configure gnu parallel; setenv MPICH_HOME /bla; make dock; make test;).

If problems occur then read the diagnostics carefully and apply the scientific method. To observe what's under the hood, view the configuration file (install/config.h) that is created by configure, and execute make -n for a dry run. Consult the FAQ. Search the DOCK-Fans mailing list archive.

NOTE FOR WINDOWS USERS: DOCK and its accessories must be run using a Unix-like environment such as Cygwin ( http://www.cygwin.com/ ). We recommend a full Unix installation. In particular, when you install your emulator, make sure to also install compilers, Unix shells, and perl ( Devel for Cygwin ). All steps below should be performed using Cygwin or another Unix emulator for Windows. See also the DOCK wiki entry for Cygwin.

(1) Unpack the distribution using the following command:

[user@dock ~] tar -zxvf dock.6.3.tar.gz

(2) Enter the installation directory:

[user@dock ~] cd dock6/install

(3) Configure the Makefile for the appropriate operating system:

[user@dock ~] ./configure [configuration file]

AUTHOR: Scott Brozell

USAGE: configure [-help] [configuration file]

OPTIONS
-help #emit the usage statement
configuration file #input file containing operating system appropriate variables

Configuration Files Target
gnu gnu compilers
gnu recent gnu compilers and ACML
gnu.parallel gnu compilers with parallel processing capability
gnu.pbsa gnu compilers with PB/SA (ZAP library) capability
gnu.parallel.pbsa gnu compilers with parallel processing and PB/SA (ZAP library) capabilities
ibmaix IBM AIX and native compilers
intel Intel compilers
intel.mkl Intel compilers and MKL
sgi sgi compilers
sgi.parallel sgi compilers with parallel processing capability
sgi.pbsa sgi compilers with PB/SA (ZAP library) capability

DESCRIPTION:
Create the DOCK configuration file, config.h, by copying an existing configuration file that is selected using the arguments. When invoked without arguments, print this usage statement and if the configuration file exists then print its creation stamp. Some configuration files require that environment variables be defined; these requirements are listed in the files and emitted by configure. Note that as of version 6.6 gfortran is the default Fortran compiler in the gnu config files (replacing g77). But other Fortran compilers may be used; simply hand edit install/config.h to use alternatives.

(4) Build the desired DOCK executable(s) via one of the following commands:

[user@dock ~] make all # builds all the DOCK programs
[user@dock ~] make dock # builds only the dock program
[user@dock ~] make utils # builds only the accessory programs

(5) Test the built executable(s) via these commands:

[user@dock ~] cd test; make test; make check

The test directory contains the DOCK quality control (QC) suite. It produces pass/fail results via fast regression tests. The suite should complete in less than ten minutes; five minutes is typical. Un-passed tests should be examined to determine their significance. The make check command executed from the test directory emits all the differences uncovered during testing. The make clean command executed from the test directory removes all files produced during testing; this command is automatically executed by the main make test command above; however, to run tests from a subdirectory of the test directory, one should explicitly execute make clean.

NOTE: Some failures are not significant. For example, differences in the tails of floating point numbers may not be significant. The sources of such differences are frequently platform dependencies from computer hardware, operating systems, and compilers that impact arithmetic precision and random number generators. In addition, the reference outputs as of version 6.4 are from a 64 bit platform and as of version 6.6 use gfortran gcc version 4.4.5, and this can cause false positives on 32 bit platforms or with other compilers; in particular, differing numbers of Orientations or Conformations and different Contact or Grid scores. We are working on increasing the QC suite's resilience to these issues. For now, apply common sense and good judgment to determine the significance of a possible failure. Note that some number of failures is rarely an indication of real problems, but if almost every test fails then something is amiss.

Some features of DOCK (DOCK3.5 Score aka ChemGrid Score) require an electrostatic potential map which is usually generated by DelPhi. Testing of these features requires that the environment variable DELPHI_PATH be defined to the full path of DelPhi. DelPhi is not distributed with DOCK.

(6) OPTIONAL: Alternate executables.

(i) DOCK with parallel processing functionality requires a Message Passing Interface (MPI) library. Because of the vagaries of MPI libraries, building parallel DOCK has more pitfalls than installing the serial version. The MPI library must be installed and running on the system if the parallel features of DOCK are to be used.
Currently, the DOCK installation mechanism only directly supports the MPICH2 and MPICH implementations. (But other MPI implementations can be accommodated probably with the only extra effort of editing the config.h file.) The MPICH2 library is freely available from Argonne National Labs ( http://www-unix.mcs.anl.gov/mpi/mpich/ ). Once MPI is installed, define the environment variable MPICH_HOME to the top level MPICH2 directory. MPICH_HOME will be referenced by all stages of the build procedure - from configuration through testing. See the Parallel DOCK section for execution information.

WARNING: The parallel configuration files have been tailored to a typical MPICH2 build. Linking problems, such as undefined references and cannot find libbla_bla, can occur due to idiosyncrasies in the MPI installation. One corrective approach is to use manual linking; add to the LIBS definition in config.h the link flags (-L and -l) from the command: $MPICH_HOME/mpicc -show; in general, the LIBS should contain those link flags in the same order.

(ii) DOCK with PB/SA scoring requires OpenEye Toolkits. In particular, DOCK uses the APIs as in OEChem TK 1.7.0 and Zap TK 2.1.1. These can be obtained from OpenEye (http://www.eyesopen.com/). Once they are installed, define the environment variable ZAP_HOME to the directory that contains the ZAP library. ZAP_HOME is used during installation and testing. For execution the environment variable OE_LICENSE must also be defined to the full path of the license file.

RETURN TO TABLE OF CONTENTS

1.4. What's New in DOCK 6

Version 6.0

The new features of DOCK 6 include: additional scoring options during minimization; DOCK 3.5 scoring-including Delphi electrostatics, ligand conformational entropy corrections, ligand desolvation, receptor desolvation; Hawkins-Cramer-Truhlar GB/SA solvation scoring with optional salt screening; PB/SA solvation scoring; AMBER scoring-including receptor flexibility; the full AMBER molecular mechanics scoring function with implicit solvent; conjugate gradient minimization and molecular dynamics simulation capabilities.

Version 6.1

The newly added features for this incremental release of DOCK 6 include a new pruning algorithm during the anchor-and-grow algorithm, a distance-based movable region and a mildly performance optimized nothing movable region for AMBER score, cleaner output and more complete output files for AMBER score, the ability to perform ranking and/or clustering on ligands between primary and secondary scoring, and more dynamic output when secondary scoring is employed.

Version 6.2

The newly added features for this incremental release of DOCK 6 include greater control over the output of conformations, improved memory efficiency for grid reading, a distance dependent dielectric control for continuous score, and for AMBER score better error reporting and robustness of the preparation scripts, a metal ions library, a cofactor library, a hook for a user library, support for RNA receptors, a minimization convergence criterion control, and the ability to skip inadequately prepped ligands.

Version 6.3

The newly added features for this incremental release of DOCK 6 include more robust input file processing, support for OpenEye Toolkits version 1.7.0 for PB/SA score, and for AMBER score improved support for RNA receptors, the option to use the existing ligand charges during preparation, and better error reporting and robustness of the preparation scripts. In particular, for AMBER scoring of RNA receptors, the distance movable region can be applied with explicit waters and the preparation can neutralize to a total charge of zero and can solvate with water.

Version 6.4

The newly added features for this incremental release of DOCK 6 include: resolving ligand internal clashes of flexible ligands (more than seven rotatable bonds) by inclusion of an internal energy function at all stages of growth; an ability to output growth trees as multi mol2 files; printing of growth statistics in the dock output file; restrained minimization with an RMSD tether, a torsion pre-minimizer.

Version 6.5

The newly added features for this incremental release of DOCK 6 include:

Now an anchor can be chosen by specifying an atom in that fragment. In addition, the number of anchors used can be limited during multi-anchor docking.

The new scoring function called descriptor score has been introduced, which includes a hydrogen bond term and footprint similarity scoring. See Balius et al.

PB/SA score has undergone some generalizations and efficiency improvements that make docking, as opposed to rescoring, more tractable for nontrivial systems.

For AMBER score the cofactors library, leaprc.dock.cofactors, and the ions library, leaprc.dock.ions, have grown substantially.

Version 6.6

The newly added features for this incremental release of DOCK 6 include: A new grid-based footprint scoring function, calculation RMSD using the Hungarian Algorithm, inclusion of orienting statistics.

RETURN TO TABLE OF CONTENTS

1.5. Overview of the DOCK Suite of Programs

1.5.1. Programs

The relationship between the main programs in the dock suite is depicted in Figure 1. These routines will be described below.

Main programs in DOCK suite

The program sphgen identifies the active site, and other sites of interest, and generates the sphere centers that fill the site. It has been described in the original paper (Kuntz et al. J. Mol. Biol. 1982). The program grid generates the scoring grids (Shoichet et al. J. Comp. Chem. 1992 and Meng et al. J. Comp. Chem. 1992). Within the DOCK suite of programs, the program DOCK matches spheres (generated by sphgen) with ligand atoms and uses scoring grids (from grid) to evaluate ligand orientations (Kuntz et al. J. Mol. Biol. 1982 and Shoichet et al. J. Comp. Chem. 1992). Program DOCK also minimizes energy based scores (Meng et al. Proteins. 1993).

1.5.2. General Concepts

The DOCK suite of programs is designed to find favorable orientations of a ligand in a “receptor.” It can be subdivided into

(i) those programs related directly to docking of ligands and
(ii) accessory programs.

We limit the discussion in this section to only those programs and methods related to docking a ligand in a receptor. A typical receptor might be an enzyme with a well-defined active site, though any macromolecule may be used (e.g. a structural protein, a nucleic acid strand, a “true” receptor). We’ll use an enzyme as an example in the rest of this discussion.

The starting point of all docking calculations is generally the crystal or NMR structure of an enzyme from an enzyme-ligand complex. The ligand structure may be taken from the crystal structure of the enzyme-ligand complex or from a database of compounds, such as the ZINC database (Irwin, et. al. J. Chem. Inf. Model. 2005). The primary consideration in the design of our docking programs has been to develop methods which are both rapid and reasonably accurate. These programs can be separated functionally into roughly two parts, each somewhat independent of the other:

(i) Routines which determine the orientation of a ligand relative to the receptor and
(ii) Routines which evaluate (score) a ligand orientation.

There is a lot of flexibility. You can generate orientations outside of DOCK and score them with the DOCK evaluation functions. Alternatively, you can develop your own scoring routines to replace the functions supplied with DOCK.

The ligand orientation in a receptor site is broken down into a series of steps, in different programs. First, a potential site of interest on the receptor is identified. (Often, the active site is the site of interest and is known a priori.) Within this site, points are identified where ligand atoms may be located. A routine from the DOCK suite of programs identifies these points, called sphere centers, by generating a set of overlapping spheres which fill the site. Rather than using DOCK to generate these sphere centers, important positions within the active site may be identified by some other mechanism and used by DOCK as sphere centers. For example, the positions of atoms from the bound ligand may be used as these sphere centers. Or, a grid may be generated within the site and each grid point may be considered as a sphere center. Our sphere centers, however, attempt to capture shape characteristics of the active site (or site of interest) with a minimum number of points and without the bias of previously known ligand binding modes.

To orient a ligand within the active site, some of the sphere centers are “matched” with ligand atoms. That is, a sphere center is “paired” with an ligand atom. Many sets of these atom-sphere pairs are generated, each set containing only a small number of sphere-atom pairs. In order to limit the number of possible sets of atom-sphere pairs, a longest distance heuristic is used; (long) inter-sphere distances are roughly equal to the corresponding (long) inter-atomic ligand distances. A set of atom-sphere pairs is used to calculate an orientation of the ligand within the site of interest. The set of sphere-atom pairs which are used to generate an orientation is often referred to as a match. The translation vector and rotation matrix which minimizes the rmsd of (transformed) ligand atoms and matching sphere centers of the sphere-atom set are calculated and used to orient the entire ligand within the active site.

The orientation of the ligand is evaluated with a shape scoring function and/or a function approximating the ligand-enzyme binding energy. Most evaluations are done on (scoring) grids in order to minimize the overall computational time. At each grid point, the enzyme contributions to the score are stored. That is, receptor contributions to the score, potentially repetitive and time consuming, are calculated only once; the appropriate terms are then simply fetched from memory.

The ligand-enzyme binding energy is taken to be approximately the sum of the van der Waal attractive, van der Waal dispersive, and Coulombic electrostatic energies. Approximations are made to the usual molecular mechanics attractive and dispersive terms for use on a grid. To generate the energy score, the ligand atom terms are combined with the receptor terms from the nearest grid point, or combined with receptor terms from a “virtual” grid point with interpolated receptor values. The score is the sum of over all ligand atoms for these combined terms. In this case, the energy score is determined by both ligand atom types and ligand atom positions on the energy grids.

As a final step, in the energy scoring scheme, the orientation of the ligand may be varied slightly to minimize the energy score. That is, after the initial orientation and evaluation (scoring) of the ligand, a simplex minimization is used to locate the nearest local energy minimum. The sphere centers themselves are simply approximations to possible atom locations; the orientations generated by the sphere-atom pairing, although reasonable, may not be minimal in energy.

1.5.3. Specific Concepts

(A) Sphere Centers

Spheres are generated to fill the target site. The sphere centers are putative ligand atom positions. Their use is an attempt to limit the enormous number of possible orientations within the active site. Like ligand atoms, these spheres touch the surface of the molecule and do not intersect the molecule. The spheres are allowed to intersect other spheres; i.e., they have volumes which overlap. Each sphere is represented by the coordinates of its center and its radius. Only the coordinates of the sphere centers are used to orient ligands within the active site (see above). Sphere radii are used in clustering.

The number of orientations of the ligand in free space is vast. The number of orientations possible from all sets of sphere-atom pairings is smaller but still large and cannot be generated and evaluated (scored) in a reasonable length of time. Consequently, various filters are used to eliminate from consideration, before evaluation, sets of sphere-atoms pairs, which will generate poorly scoring orientations. That is, only a small subset of the number of possible ligand orientations are actually generated and scored. The distance tolerance is one filter. Sphere “coloring” and identification of “critical” spheres are other filters.

Sphere-sphere distances are compared to atom-atom distances. Sets of sphere-atom pairs are generated in the following manner: sphere i is paired with atom I if and only if for every sphere j in the set and for every atom J in the set,

where dij is the distance between sphere i and sphere j, dIJ is the distance between atom I and atom J, and epsilon is a somewhat small user-defined value.

(B) Chemical Matching

DOCK spheres are generated without regard to the chemical properties of the nearby receptor atoms. Sphere “chemical matching” or “coloring” associates a chemical property to spheres and a sphere of one “color” can only be matched with a ligand atom of complementary color. These chemical properties may be things such as “hydrogen-bond donor,” “hydrogen-bond acceptor,” “hydrophobe,” “electro-positive,” “electro-negative,” “neutral,” etc. Neither the colors themselves, nor the complementarity of the colors, are determined by the DOCK suite of programs; DOCK simply uses these labels. With the inclusion of coloring, only ligand atoms with the appropriate chemical properties are matched to the complementary colored spheres. It is probably more likely, then, that the orientation generated will produce a favorable score. Conversely, by excluding colored spheres from pairing with certain ligand atoms, the number of (probably) unfavorable orientations which are generated and evaluated can be reduced. Note that requiring complementarity in matching does not mean that all ligand atoms will lie in chemically complementary regions of the enzyme. Rather, only those ligand atoms, when paired with a colored sphere which is part of the sphere-atom match, will be guaranteed to be in the chemically complementary region of the enzyme (provided chirality of the spheres is the same as that of the matching ligand atoms).

(C) Critical Points

The "critical point" filter requires that certain spheres be part of the set of sphere-atom pairs used to orient the ligand (DesJarlais et al. J. Comput-Aided Molec. Design. 1994). Designating spheres as critical points forces the ligand to have at least one atom in that area of the enzyme, where that sphere is located. This filter may be useful, for example, when it is known that a ligand must occupy a particular area of an active site. This filter removes from consideration any orientation that does not guarantee at least one ligand atom in critical areas of the enzyme (provided chirality of the spheres is the same as that of the matching ligand atom).

(D) Bump Filter

After a ligand is oriented within the active site, the orientation is evaluated. In an attempt to reduce the total computational time, after the ligand is oriented in the site, it is possible to first check whether or not ligand atoms occupy space already occupied by the receptor. If too many of such “bumps” are found, then the ligand is likely to intersect the receptor even after minimization; consequently, the ligand orientation is discarded before evaluation.

1.5.4. Units

The units of the DOCK suite of programs are lengths in angstroms, masses in atomic mass units, charges in electron charges units, and energies in kcal/mol. For Amber score internally and on input of charges from a prmtop file the charges are scaled by 18.2223.

RETURN TO TABLE OF CONTENTS


DOCK

RETURN TO TABLE OF CONTENTS

2.1. Overview

This section is intended as a reference manual for the features of the DOCK Suite of Programs. It is intended to give an overview of the ideas which form the basis of the DOCK suite of programs and to detail the available user parameters. It is not intended to be a substitute for all the papers written on DOCK.

In general, this document is geared towards the experienced user and introduces new features and concepts in version 6. If you are new to DOCK, we strongly recommend you look at the tutorials on the DOCK web site at http://dock.compbio.ucsf.edu/DOCK_6/index.htm, which go into much greater practical detail.

RETURN TO TABLE OF CONTENTS

2.2. History

Version 1.0/1.1

Authors: Robert Sheridan, Renee DesJarlais, Irwin Kuntz

The program DOCK is an automatic procedure for docking a molecule into a receptor site. The receptor site is characterized by centers, which may come from sphgen or any other source. The molecule being docked is characterized by ligand centers, which may be its non-hydrogen atoms or volume-filling spheres calculated in sphgen. The ligand centers and receptor centers are matched based on comparison of ligand-center/ligand-center and receptor-center/receptor-center distances. Sets of ligand centers match sets of receptor centers if all the internal distances match, within a value of distance_tolerance. Ligand-receptor pairs are added to the set until at least nodes_minimum pairs have been found. At least three pairs must be found to uniquely determine a rotation/translation matrix that will orient the ligand in the receptor site. A least-squares fitting procedure is used (Ferro et al. Act. Cryst. A. 1977). Once an orientation has been found, it is evaluated by any of several scoring functions. DOCK may be used to explore the binding modes of an individual molecule, or be used to screen a database of molecules to identify potential ligands.

Version 2.0

Authors: Brian Shoichet, Dale Bodian, Irwin Kuntz

DOCK version 2.0 was written to give the user greater control over the thoroughness of the matching procedure, and thus over the number of orientations found and the CPU time required (Shoichet et al. J. Comp. Chem. 1992). In addition, certain algorithmic shortcomings of earlier versions were overcome. Versions 2.0 and higher are particularly useful for macromolecular docking (Shoichet et al. J. Mol. Biol. 1991) and applications which demand detailed exploration of ligand binding modes. In these cases, users are encouraged to run CLUSTER in conjunction with sphgen and DOCK.

To allow for greater control over searches of orientation space, the ligand and receptor centers are pre-organized according to their internal distances. Starting with any given center, all the other centers are presorted into “bins” based on their distance to the first center. All centers are tried in turn as “first” positions, and all the points in a bin which has been chosen for matching are tried sequentially. Ligand and receptor bins are chosen for matching when they have the same distance limits from their respective “first” points. The number of centers in each bin determines how many sets of points in the receptor and the ligand will ultimately be compared. In general, the wider the bins, the greater the number of orientations generated. Thus, the thoroughness of the search is under user control.

Version 3.0

Authors: Elaine Meng, Brian Shoichet, Irwin Kuntz

Version 3.0 retained the matching features of version 2.0, and introduced options for scoring (Meng et al. J. Comp. Chem., 1992). Besides the simple contact scores mentioned above, one can also obtain molecular mechanics interaction energies using grid files calculated by CHEMGRID (which is now superseded by GRID in version 4.0). More information about the ligand and receptor molecules is required to perform these higher-level kinds of scoring. Point charges on the receptor and ligand atoms are needed for electrostatic scoring, and atom-type information is needed for the van der Waals portion of the force field score. Input formats (some of them new in version 3.5) are discussed in various parts of the documentation; one example of a “complete format” (including point charges and atom type information) is SYBYL MOL2 format. Parameterization of the receptor is discussed in the documentation for CHEMGRID. In DOCK, ligand parameters are read in along with the coordinates; input formats are described below. Currently, the options are: contact scoring only, contact scoring plus Delphi electrostatic scoring, and contact scoring plus force field scoring. Atom-type information and point charges are not required for contact scoring only.

Version 3.5

Authors: Mike Connolly, Daniel Gschwend, Andy Good, Connie Oshiro, Irwin Kuntz

Version 3.5 added several features: score optimization, degeneracy checking, chemical matching and critical clustering.

Version 4.0

Authors: Todd Ewing, Irwin Kuntz

Version 4.0 was a major rewrite and update of DOCK (Ewing et al. 2001 ). A new matching engine was developed which is more robust, efficient, and easier to use (Ewing and Kuntz. J. Comput. Chem. 1997). Orientational sampling can now be controlled directly by specifying the number of desired orientations. Additional features include chemical scoring, chemical screening, and ligand flexibility.

Version 5.0-5.4

Authors: Demetri Moustakas, P. Therese Lang, Scott Pegg, Scott Brozell, Irwin Kuntz

Version 5 was rewritten in C++ in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools (Moustakas et al., 2006). Additional new features include MPI parallelization, exhaustive orientation searching, improved conformation searching, GB/SA solvation scoring, and post-screening pose clustering. (Zou et al. J. Am. Chem. Soc., 1999)

Version 6.0-6.4

DOCK 6 is an extension of the DOCK 5 code base. It includes the implementation of Hawkins-Cramer-Truhlar GB/SA solvation scoring with salt screening and PB/SA solvation scoring through OpenEye's Zap Library. Additional flexibility has been added to scoring options during minimization. The new code also incorporates DOCK version 3.5.54 scoring features like Delphi electrostatics, ligand desolvation, and receptor desolvation. Finally, DOCK 6 introduces new code that allows access to the NAB library of functions such as receptor flexibility, the full AMBER molecular mechanics scoring function with implicit solvent, conjugate gradient minimization, and molecular dynamics simulation capabilities. See Lang et al. RNA, 2009 and Shivakumar et al., 2008.

RETURN TO TABLE OF CONTENTS

2.3. Command-line Arguments

DOCK must be run from the command line in a standard unix shell. It reads an input parameter file containing field/value pairs:

USAGE: dock6 -i dock.in [-o dock.out] [-v]

DESCRIPTION:
DOCK may be executed in either interactive or batch mode, depending on whether the output is written to a file. In interactive mode, the user is requested only for parameters relevant to the particular run and default values are provided. This mode is recommended for the initial construction of the input file and for short calculations. In batch mode, input parameters are read in from the input file and all output is written to the output file. This mode is recommended for long calculations once an input file has been generated interactively.

OPTIONS
-i dock.in #input file containing user-defined parameters
-help
#emit the usage statement.
-v #verbosity flag that prints additional information and warnings for scoring functions
-o dock.out #output file containing the parameters used in the calculation, summary information for each molecule docked, and all warning messages

Interactive mode

USAGE: dock6 -i dock.in

DESCRIPTION:
When launched this way, DOCK will extract all relevant parameters from dock.in (or any file supplied by the user). If additional parameters are needed (or if the dock.infile is non-existent or empty), DOCK will request them one at a time from the user. Reasonable default values are presented. Any parameters supplied by the user will be automatically appended to the dock.in file. If the user would like to change any previously entered values, the user can edit the dock.in file using a text editor.

Batch mode

USAGE: dock6 -i dock.in -o dock.out

DESCRIPTION:
When launched in this way, DOCK will run in batch mode, extracting all relevant parameters from dock.in (or any file supplied by the user) and will write out all output to dock.out (or any file supplied by the user). If any parameters are missing or incorrect, then execution will halt and an appropriate error message will be reported in dock.out.

Parallel DOCK

USAGE: mpirun [-machinefile machfile] [-np #_of_proc] dock6.mpi -i dock.in -o dock.out [-v]

DESCRIPTION:
If DOCK has been built for parallel processing (see Installation) then DOCK can be run in parallel. Parallelization employs a single master processor with the remaining processors acting as slaves. If np = 1, the code defaults to non-MPI behavior. There is a minimal difference in performance between 1 and 2 processors. Improved performance is only evident with more than 2 processors.

ADDITIONAL OPTIONS:
-machinefile #simple text file containing the names of the computers (nodes) to be used
-np #
specifies the number of processors which typically is the same as the number of lines in the machinefile

MPICH2 QUICK START:
MPICH2, unlike MPICH, requires command line initialization. If your system administrators have not initialized MPICH2 then follow these steps:
Create a .mpd.conf file in your home directory containing secretword=mysecretword.
Create a .mpd.hosts file in your home directory containing a list of machine names, one per line.
Start the MPICH2 daemons by executing mpdboot.
Verify the start up of the MPICH2 daemons by executing mpdtrace -l.
Finally execute DOCK using mpiexec or mpirun. For further information see the MPICH2 README.

PB/SA DOCK

USAGE: dock6.pbsa -i dock.in [-o dock.out] [-v]

DESCRIPTION:
If you have compiled DOCK for use with the ZAP library (see Installation), DOCK can be run using the ZAP PB/SA scoring function. The environment variable OE_LICENSE must be defined to the full path of the license file.

RETURN TO TABLE OF CONTENTS

2.4. The Parameter Parser

In Interactive Mode, dock will dynamically ask the user to enter the appropriate user parameters. The generic format for the questions is:

parameter_name [default value] (legal values):

The parameter parser requires that the values entered for a parameter exactly match one of the legal values. For example:

Example A: program_location [Hello_World!] ():

Example B: #_red_balloons [99] ():

Example C: glass_status [half_full] (half_full half_empty):

In Example A, the parameter "program_location" can be assigned any string value, and in Example B, the parameter "#_red_balloons" can be assigned any integer value. However, in Example C, the parameter value "glass_status" can only be assigned the strings "half_full" or "half_empty". If no parameter are assigned by the user, the default value--in brackets--will be used.

In Batch Mode, all parameters in the dock.in file, must be:

parameter_name value

Note that the parameter_name and corresponding value must be separated by white space, namely, blanks or tabs.

RETURN TO TABLE OF CONTENTS

2.5. Ligand File Input

Before you can dock a ligand, you will need atom types and charges for every atom in the ligand. Currently, DOCK only reads the Tripos MOL2 format. For a single ligand (or several ligands), you can use Chimera in combination with antechamber to prepare a MOL2 file for the ligand (see Structure Preparation Tutorial) or various other visualization packages. During the docking procedure, ligands are read in from a single MOL2 or multi-MOL2 file. Atom and bond types are assigned using the DOCK 4 atom/bond typing parameter files.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Molecule Library Parameters

2.5.1. Ligand RMSD

Three types of root mean square distance (RMSD) values are reported when "calculate_rmsd = yes". These values can be found in the header of the output MOL2 file.

(1) Standard heavy-atom RMSD (HA_RMSDs): This is the standard pair-wise RMSD calculation between the non-hydrogen atoms of a reference conformation a and a pose conformation b for a ligand with N total heavy atoms of index i:


If the HA_RMSDs is "-1000.0", then there is an inconsistency in the number of heavy atoms between the reference and the docked conformer.

(2) Minimum-distance heavy-atom RMSD (HA_RMSDm): This measure is based on the RMSD implementation used in Autodock Vina (Trott and Olson, J. Comput. Chem. 2010), which does not explicitly enforce one-to-one mapping. Rather, atom pairings between reference conformation a and pose conformation b are determined by the minimum distance to any atom of the same element type, and it may be an under-prediction of the true RMSD.


(3) Hungarian (symmetry-corrected) heavy-atom RMSD (HA_RMSDh): The final RMSD implementation is based on an O(N^4) implementation of the Hungarian algorithm (Kuhn, Nav. Res. Logist. Q. 1955; Munkres, J. Soc. Indust. Appl. Math. 1957). The algorithm solves the optimal assignment between a set of reference ligand atoms a and a set of pose ligand atoms b of the same size. For all groups of atoms of the same Sybyl atom type, a cost matrix M is populated where each matrix element mij is equal to the distance-squared between reference atom ai and pose atom bj. The Hungarian algorithm is used to determine one-to-one assignments between reference and pose ligand atoms such that the total distance between atoms is minimized. The new assignments c(i) are fed into the standard RMSD function in order to compute a symmetry-corrected RMSD. If the HA_RMSDh is "-1000.0", then there is an inconsistency in the number of atoms of at least one atom type between the reference and the docked conformer.


RETURN TO TABLE OF CONTENTS

2.6. Database Filter

The Database Filter is designed for on-the-fly filtering of small molecules from the database during docking. Filtering small molecules by heavy atoms, rotatable bonds, molecular weight and formal charge is currently supported. This routine is designed to be modular so that other descriptors can be easily added. The default values are deliberately set to allow most small molecules to pass through. One use of this routine would be to partition a database into subsets such as "0-7 rotbonds" or "300-500 molwt" or "neutral charge". Another use would be to exclude ligands that are too small (<200 amu) or too large (>500 amu) for a particular target. This routine can also be used to filter a database without performing any docking.

Database Filter Parameters

RETURN TO TABLE OF CONTENTS

2.7. Orienting the Ligand

2.7.1. Sphere Matching

The rigid body orienting code is written as a direct implementation of the isomorphous subgraph matching method of Crippen and Kuhl (Kuhl et al. J. Comput. Chem. 1984). All receptor sphere pairs and atom center pairs are considered for inclusion in a matching clique. This is more computationally demanding than the clique matching algorithm implemented in previous versions that used a distance binning algorithm to restrict the clique search, in which pairs of spheres and atom centers were binned by distance. Only sphere pairs and center pairs that were within the same distance bin were considered as potential matches (Ewing and Kuntz. J. Comput. Chem. 1997).

The clique matching implementation avoids bin boundaries that prevent some receptor sphere and ligand atom pairs from matching, and, as a result, it can find good matches missed by previous versions of DOCK. The rigid body rotation code has also been corrected to avoid a singularity that occurred if the spheres in the match lay within the same plane.

There are two types of ligand orientation currently available:

(1) Automated Matching —Specify the number of orientations, and DOCK will generate matches until enough orientations passing the bump filter have been formed. Matches are formed best first, with respect to the difference in the ligand and site point internal distances.
(2) Manual Matching —Specify the distance and node parameters, and DOCK will generate all the matches which satisfy them. The number of orientations scored is equal to the total matches minus the orientations discarded by the user applied filters.

Multiple orientations may be written out for each molecule using the write_orientations parameter (see Ligand File Output), otherwise only the best orientation is recorded. Note that this feature is available only in serial DOCK; in parallel DOCK only the best orientation is emitted. In addition, now "VERBOSE ORIENTING STATS" are now printed when the verbose flag is used. This prints orienting parameters, residual statistics, statistics on the nodes used in the match. See the Growth Tree And Statistics section below for an example of the output.

RETURN TO TABLE OF CONTENTS

2.7.2. Critical Points

The critical_points feature is used to focus the orientation search into a subsite of the receptor active site (DesJarlais et al. J. Comput-Aided Molec. Design. 1994 and Miller et al. J. Comput. Aided Mol. Design. 1994). For example, identifying molecules that interact with catalytic residues might be of chief interest. Any number of points may be identified as critical (see Critical Points in the sphgen documentation for information on labeling spheres), and any number of groupings of these points may be identified. Cliques are checked for critical points by comparing spheres; the criterion is that every grouping must have a coincident sphere and the first coincident sphere found in a grouping terminates further searching of that grouping. An alternative to using critical points is to discard all site points that are some distance away from the subsite of interest, while retaining enough site points to define unique ligand orientations. This feature can be highly effective at reducing matching by five-fold or more. It is particularly useful to also assign chemical labels to the critical points to further focus sampling. In this case cliques are checked first for satisfaction of the critical points criterion and then for satisfaction of the chemical matching criteria.

RETURN TO TABLE OF CONTENTS

2.7.3. Chemical Matching

The chemical_matching feature is used to incorporate information about the chemical complementarity of a ligand orientation into the matching process. In this feature, chemical labels are assigned to site points (see Chemical Matching in the sphgen documentation for information on labeling spheres) and ligand atoms (see Ligand File Input) (Kuhl et al. J. Comput. Chem. 1984). The site point labels are based on the local receptor environment. The ligand atom labels are based on user-adjustable chemical functionality rules. These labeling rules are identified with the chemical_defn_file parameter and reside in an editable file (see chem.defn). A node in a match will produce an unfavorable interaction if the atom and site point components have labels which violate a chemical match rule. The chemical matching rules are identified with the chemical_match_file parameter and reside in an editable file (see chem_match.tbl). If a match will produce unfavorable interactions, then the match is discarded. The speed-up from this technique depends how extensively site points have been labeled and the stringency of the match rules, but an improvement of two-fold or more can be expected.

RETURN TO TABLE OF CONTENTS

2.7.4. Macromolecular Docking

Who has used this functionality recently!

Although DOCK is typically applied to small ligand molecules, it can be used to study macromolecular ligands, for example protein-protein and protein-DNA complexes. The chief difference in protocol is that to use the match_receptor_sites procedure for the orientation search, special ligand centers must be used to represent the ligand. This is signaled by setting the ligand_centers parameter. The ligand centers may be constructed by sphgen and must reside in a file identified with the ligand_center_file parameter. See Shoichet et al. J. Mol. Biol. 1991 for examples and discussion of macromolecular docking.

RETURN TO TABLE OF CONTENTS

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Orient Ligand Parameters

RETURN TO TABLE OF CONTENTS

2.8. Internal Energy Calculation

During growth and minimization, an internal energy scoring function can be used. The goal of the internal energy function is to reduce the occurrence of internal clashes during the torsional optimization. This function computes the repulsive Lennard-Jones term between all ligand atom pairs, excluding all 1-2, 1-3, and 1-4 pairs. (Currently, attractive Lennard-Jones and Coulombic terms are neglected; he aim is to eliminate internal clashes not to optimize the internal geometry. in addition, since there is no dihedral term in the force field, if the attractive terms are included the molecule might appear less physical.) The internal energy can be cut on or off; if cut on, it is all ways used and is reported in the final energy. If it is cut of it is never used. The pruning during growth is done by considering both the internal and interaction energies. We recommend the use of the internal energy function for all calculations.

Internal Energy Parameters

RETURN TO TABLE OF CONTENTS

2.9. Ligand Flexibility

The internal degrees of freedom of the ligand can be sampled with the anchor-and-grow incremental construction approach. This conformational search algorithm has been validated for binding mode prediction on sets of ligands that have no more than seven rotatable bonds (Moustakas et al., 2006).

2.9.1. Anchor-and-Grow

The process of docking a molecule using the anchor-first strategy is shown in the Workflow for Anchor-and-Grow Algorithm Ewing et al. 2001 . First, the largest rigid substructure of the ligand (anchor) is identified (see Identification of Rigid Segments) and rigidly oriented in the active site (orientation) by matching its heavy atoms centers to the receptor sphere centers (see Orienting the Ligand). The anchor orientations are evaluated and optimized using the scoring function (see Scoring) and the energy minimizer (see Minimization). In general, the orientations are then ranked according to their score, spatially clustered by heavy atom root mean squared deviation (RMSD), and pruned (see Pruning the Conformation Search Tree). Next, the remaining flexible portion of the ligand (see Identification of Flexible Layers) is built onto the best anchor orientations within the context of the receptor (grow). It is assumed that the shape of the binding site will help restrict the sampling of ligand conformations to those that are most relevant for the receptor geometry.

 

Workflow for Anchor-and-Grow Algorithm

The conformation of a flexible molecule may be searched or relaxed using the flexible_ligand option. Only the torsion angles are modified, not the bond lengths or angles. Therefore, the input geometry of the molecule needs to be of good quality. A structure generated by ZINC is sufficient.

The torsion angle positions reside in an editable file (see flex_drive.tbl on page 111) which is identified with the flex_drive_file parameter. Internal clashes are detected during the torsion drive search based on the clash_overlap or internal_energy parameters, which are independent of scoring function.

RETURN TO TABLE OF CONTENTS

2.9.2. Identification of Rigid Segments

A flexible molecule is treated as a collection of rigid segments. Each segment contains the largest set of adjacent atoms separated by non-rotatable bonds. Segments are separated by rotatable bonds.

The first step in segmentation is ring identification. All bonds within molecular rings are treated as rigid. This classification scheme is a first-order approximation of molecular flexibility, since some amount of flexibility can exist in non-aromatic rings. To treat such phenomenon as sugar puckering and chair-boat hexane conformations, the user will need to supply each ring conformation as a separate input molecule. Additional bonds may be specified as rigid by the user (see Manual Specification of Non-rotatable Bonds).

Identification of Rigid Anchor and Flexible Bonds

The second step is flexible bond identification. Each flexible bond is associated with a label defined in an editable file (see flex.defn). The parameter file is identified with the flex_definition_file parameter. Each label in the file contains a definition based on the atom types (and chemical environment) of the bonded atoms. Each label is also flagged as minimizable. Typically, bonds with some degree of double bond character are excluded from minimization so that planarity is preserved. Each label is also associated with a set of preferred torsion positions. The location of each flexible bond is used to partition the molecule into rigid segments. A segment is the largest local set of atoms that contains only non-flexible bonds.

RETURN TO TABLE OF CONTENTS

2.9.3. Manual Specification of Non-rotatable Bonds

Currently this functionality is not available!

The user can specify additional bonds to be non-rotatable, to supplement the ring bonds automatically identified by DOCK. Such a technique could be used to preserve the conformation of part of a molecule and isolate it from the conformation search. Non-rotatable bonds are identified in the Tripos MOL2 format file containing the molecule. The bonds are designated as members of a STATIC BOND SET named RIGID (see Tripos MOL2 Format).

Creation of the RIGID set can be done within Chimera. With the molecule of interest loaded into Chimera, select the portion of the ligand you would like to remain rigid. Then select on File > Save MOL2. Make sure the "Write current selection to @ SETS section of file" is checked and save the file.

Alternatively, the RIGID set can be entered into the MOL2 file by hand. To do this, go to the end of the MOL2 file. If no sets currently exist, then add a SET identifier on a new line. It should contain the text "@<TRIPOS>SET". On a new line add the text "RIGID STATIC BONDS <user> **** Comment". On the next line enter the number of bonds that will be included in the set, followed by the numerical identifier of each bond in the set.

RETURN TO TABLE OF CONTENTS

2.9.4. Identification of Flexible Layers

Anchor Selection

An anchor segment is normally selected from the rigid segments in an automatic fashion (see Manual Specification of Non-rotatable Bonds to override this behavior). The molecule is divided into segments that overlap at each rotatable bond. The segment with the largest number of heavy atoms is selected as the first anchor, number of attachment points are also considered. All segments with more heavy atoms then min_anchor_size are tried separately as anchors. The number of anchors can be limited by setting the limit_max_anchors flag to "yes"; max_anchor_num is used to specify the maximum number of anchors to be used (anchors are ordered by heavy atoms and attachment points):

min_anchor_size 5
limit_max_anchors yes
max_anchor_num 3


At most 3 anchors are used and all anchors have at least 5 heavy atoms.

To use a single specific anchor (e.g scaffold with known bonding pose), specify an atom name and its corresponding atom number in the chosen fragment (e.g. if atom number 10 is C16):

user_specified_anchor yes
atom_in_anchor C16,10


Identification of Overlapping Segments

When an anchor has been selected, then the molecule is redivided into non-overlapping segments, which are then arranged concentrically about the anchor segment. Segments are reattached to the anchor according to the innermost layer first and within a layer to the largest segment first.

Layered Non-Overlapping Segments

The anchor is processed separately (either oriented, scored, and/or minimized). The remaining segments are subsequently re-attached during the conformation search. The interaction energy between the receptor and the ligand can be optimized with a simplex minimizer (see Minimization).

RETURN TO TABLE OF CONTENTS

2.9.5. Pruning the Conformation Search Tree

Starting with version 6.1, there are two methods for pruning. The first method is the one that existed in earlier versions; it is the default and corresponds to input parameter pruning_use_clustering = yes. In this method pruning attempts to retain the best, most diverse configurations using a top-first pruning algorithm, which proceeds as follows. The configurations are ranked according to score. The top-ranked configuration is set aside and used as a reference configuration for the first round of pruning. All remaining configurations are considered candidates for removal. A root-mean-squared distance (RMSD) between each candidate and the reference configuration is computed. Each candidate is then evaluated for removal based on its rank and RMSD using the inequality:

If the factor is greater than number_confs_for_next_growth, as appropriate, the candidate is removed. Based on this factor, a configuration with rank 2 and 0.2 angstroms RMSD is comparable to a configuration with rank 20 and 2.0 angstroms RMSD. The next best scoring configuration which survives the first pass of removal is then set aside and used as a reference configuration for the second round of pruning, and so on. This pruning method biases its search time towards molecules that sample a more diverse set of binding modes. As the values of num_anchors_orients_for_growth and number_confs_for_next_growth are increased, the anchor-first method approaches an exhaustive search.

In the second method, the goal is to bias the sampling towards conformations that are close to the correct binding mode (as optimized using a test set of experimentally solved structures). Much as the method above, the algorithm ranks the generated poses and conformations. Then, all poses that violate a user-defined score cutoff are removed. To facilitate the speed of the calculation, the remaining list is additionally pared back to a user-defined length. In this method, the sampling is driven towards molecules that sample closer to the experimentally determined binding site, and the result is a significantly less diverse set of final poses.

RETURN TO TABLE OF CONTENTS

2.9.6 Time Requirements

The time demand grows linearly with the num_anchors_orients_for_growth, the number_confs_for_next_growth, the number of flexible bonds and the number of torsion positions per bond, as well as the number of anchor segments explored for a given molecule. Using the notation in the Workflow for Anchor-and-Grow Algorithm, the time demand can be expressed as

where the additional terms are:
NA is the number of anchor segments tried per molecule.
NB is the number of rotatable bonds per molecule.

RETURN TO TABLE OF CONTENTS

2.9.7 Growth Tree and Statistics

Dock uses Breadth First Search to sample the conformational space of the ligand. The tree is pruned at every stage of growth to remove unsuitable conformations. In order to be as space efficient as possible, DOCK only saves one level of growth at a time unless "write_growth_tree" is turned on. In order to construct the growth tree it was necessary to do the following: (1) Retain all levels of growth (before and after minimization) in memory. (2) Link every conformer to its parent conformer during growth. (3) While writing out the tree, the traversal starts from a fully grown ligand (leaf), moving up the branch (parent conformer) until the ligand anchor (root) is reached. Finally, the growth tree branch is printed as a multi-mol2 file starting from the anchor to the fully grown ligand, including minimizations. This newly implemented feature allows visualization of all stages of growth and optimize behavior of current DOCK routines. Note that the growth trees can easily be visualized using the Viewdock module in the UCSF chimera program. Extra information regarding conformer number, anchor number, parent conformer etc. can also be accessed directly using this tool.

Format for branch files name is as follows:

${Ligand name}_anchor${anchor number}_branch${conformer number of fully grown mol.}.mol2

e.g. LIG1_anchor1_branch4.mol2

The ligand name is that specified in the mol2 file. The anchor number indicates what fragment or portion of the molecule was used as the anchor. The every conformer (both partially and fully grown) is assigned a unique number.

we recommend that users cat files together and compress them.

cat *_branch*.mol2 > growth_tree.mol2; gzip growth_tree.mol2

In addition, growth statistics are printed to the output files if the verbose flag is used.

-----------------------------------
VERBOSE MOLECULE STATS
Number of heavy atoms = 30
Number of rotatable bonds = 7
Formal Charge = 1.00
Molecular Weight = 429.56
Heavy Atoms = 30
-----------------------------------
VERBOSE ORIENTING STATS :
Orienting 10 anchor heavy atom centers
Sphere Center Matching Parameters:
tolerance: 0.25; dist_min: 2; min_nodes: 3; max_nodes: 10
Num of cliques generated: 2298
Residual Info:
min residual: 0.0261
median residual: 0.3932
max residual: 0.5000
mean residual: 0.3737
std residual: 0.0935
Node Sizes:
min nodes: 3
max nodes: 4
mean nodes: 3.0070
# of anchor positions: 1000
-----------------------------------
VERBOSE GROWTH STATS : ANCHOR #1
32/1000 anchor orients retained (max 1000) t=9.06s
Lyr 1-1 Segs|Lyr 2-1 Segs|Lyr 3-2 Segs|Lyr 4-2 Segs|Lyr 5-1 Segs|
Lyr:1 Seg:0 Bond:8 : Sampling 6 dihedrals C6(C.ar) C4(C.ar) C3(C.3) C1(C.3)
Lyr:1 Seg:0 24/192 retained, Pruning: 6-score 162-clustered t=10.68s
Lyr:2 Seg:0 Bond:5 : Sampling 3 dihedrals C4(C.ar) C3(C.3) C1(C.3) N1(N.3)
Lyr:2 Seg:0 51/72 retained, Pruning: 21-clustered t=11.38s
Lyr:3 Seg:0 Bond:1 : Sampling 3 dihedrals C3(C.3) C1(C.3) N1(N.3) S1(S.o2)
Lyr:3 Seg:0 105/153 retained, Pruning: 7-score 41-clustered t=13.37s
Lyr:3 Seg:1 Bond:3 : Sampling 6 dihedrals N4(N.am) C2(C.2) C1(C.3) C3(C.3)
Lyr:3 Seg:1 86/630 retained, Pruning: 8-score 536-clustered t=23.93s
Lyr:4 Seg:0 Bond:43 : Sampling 3 dihedrals C16(C.ar) S1(S.o2) N1(N.3) C1(C.3)
Lyr:4 Seg:0 90/258 retained, Pruning: 168-clustered t=28.85s
Lyr:4 Seg:1 Bond:26 : Sampling 2 dihedrals C11(C.3) N4(N.am) C2(C.2) C1(C.3)
Lyr:4 Seg:1 147/180 retained, Pruning: 5-score 28-clustered t=35.28s
Lyr:5 Seg:0 Bond:46 : Sampling 6 dihedrals C17(C.ar) C16(C.ar) S1(S.o2) N1(N.3)
Lyr:5 Seg:0 104/882 retained, Pruning: 15-outside grid 22-score 741-clustered t=77.71s

These are the verbose growth statistics for flexible docking to 1PPH (thrombin). These are printed only when the verbose flag is enabled in the command line. This feature is useful for debugging incomplete growths and other possible issues with the growth routines. This feature is also useful to show progress when docking in larger peptide-like ligands (20+ rotatable bonds) which can take several hours. Cumulative timing in seconds (e.g. t=13.37s) is shown at the end of each line to allow quick profiling of the slowest steps during docking. A separate section is printed for each anchor sampled when using multiple anchors. For anchor #1, the orienting routine produces 1000 orients, and 37 are retained after clustering and minimization. The ligand has 7 rotatable bonds. The second line shows the assignment of layers and segments. For details on the terminology, please consult the DOCK 4 paper. subsequently, two lines of information are printed for each torsion sampled.

Lyr:1 Seg:0 indicates that this is Layer #1 and Segment #0. Layer and segment number starts from zero, and corresponds to the array indices used internally. Bond:8 refers to bond number in the mol2 file read in. "Sampling 6 dihedrals C6(C.ar)  C4(C.ar)  C3(C.3)  C1(C.3)" specifies the exact torsion being sampled. Six dihedral positions are being sampled in this case, as determined by the drive_id in flex_drive.tbl. 21/246 retained means 21 conformers were retained from the 246 conformers generated during growth (41 conformers x 6 dihedral positions = 246 new conformers). The Pruning: section demonstrates how these (246-21) or 225 conformers were pruned: 2 conformers were outside the energy grid, 5 conformers exceeded the score cut-off (see pruning_conformer_score_cutoff) and 218 conformers were clustered. Typically clustering removes the greatest number of conformers during each torsion grown as controlled by the pruning_clustering_cutoff parameter. The reader is encouraged to verify that the number of conformers retained can be calculated as above at each stage of growth. If the growth tree is turned on, the total number of conformers stored in the growth tree are also reported.

RETURN TO TABLE OF CONTENTS

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Flexible Ligand Parameters

RETURN TO TABLE OF CONTENTS

2.10. Scoring

DOCK uses several types of scoring functions to discriminate among orientations and molecules. Scoring is requested using the score_molecules parameter. The scoring functions are implemented with a hierarchical strategy. A master score class manages all scoring functions that DOCK uses. Any of the DOCK scoring functions can be selected as the primary and/or the secondary scoring function. The primary scoring function is used during rigid orienting, anchor-and-grow steps, and minimization, which typically make many calls to the scoring function. The secondary scoring function is used in the final minimization, scoring, and ranking of the molecules.

RETURN TO TABLE OF CONTENTS

2.10.1. Bump Filter

Orientations and grow steps may be filtered prior to scoring to discard those in which the molecule significantly overlaps receptor atoms. This feature is enabled with the bump_filter flag. At the time of construction of the bump filter, the amount of atom VDW overlap is defined with the bump_overlap parameter (see Grid). At the time of bump evaluation the number of allowed bumps is defined with the max_bump_anchor and max_bump_growth parameter.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Bump Filter Parameters

RETURN TO TABLE OF CONTENTS

2.10.2. Contact Score

The contact score is a simple summation of the number of heavy atom contacts between the ligand and receptor. At the time of construction of the contact scoring grid, the distance threshold defining a contact is set with the contact_cutoff_distance (see Grid). Atom VDW overlaps are penalized by checking the bump filter grid, or with the contact_clash_overlap parameter for the intramolecular score. The amount of penalty is specified with the contact_clash_penalty parameter.

The contact score provides a simple assessment of shape complementarity. It can be useful for evaluating primarily non-polar interactions.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Contact Score Parameters

2.10.3. Grid-Based Score

The grid-based score is based on the non-bonded terms of the molecular mechanic force field (see Grid for more background).

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Grid Score Parameters

RETURN TO TABLE OF CONTENTS

2.10.4. DOCK3.5 Score

DOCK3.5 score is a variant of Grid-based scoring (see Grid). DOCK3.5 score function calculates ligand desolvation in addition to steric and electrostatic interactions between the ligand and receptor. The electrostatic interactions between the ligand and the protein is calculated from a electrostatic potential (ESP) map. The ESP map should be calculated using finite difference Poisson-Boltzmann equation (PBE) as implemented in the program DelPhi. We provide scripts for the calculation, however DelPhi is not distributed by us.

In the DOCK scoring hierarchy DOCK3.5 Score follows GridScore and can be used both as the primary and the secondary scoring function.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

DOCK3.5 Score Parameters

2.10.5. Continuous Score

Continuous scoring may be used to evaluate a ligand:receptor complex without the investment of a grid calculation, or to perform a more detailed calculation without the numerical approximation of the grid. See Meng et al. J. Comp. Chem. 1992. When continuous scoring is requested, then score parameters normally supplied to Grid, must also be supplied to DOCK. It is left to the user to make sure consistent values are supplied to both programs.

The distance dependence of the Lennard-Jones function is set with the cont_score_att_exp and cont_score_rep_exp parameters. Typically a 6-12 potential is used, but it can be softened up by using a 6-8 or 6-9 potential. The source of the radii and well-depths is the same as for other atom typing applications; see vdw.defn. The distance dependence of the Coulombic function is set with the cont_score_use_dist_dep_dielectric parameter. The dielectric constant is adjusted with the cont_score_dielectric parameter. Note the correspondence of these parameters to those for Grid Energy Scoring.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Continuous Score Parameters

RETURN TO TABLE OF CONTENTS

2.10.6. Zou GB/SA Score

The Zou GB/SA scoring function is a fast algorithm, the pairwise free energy model, for ligand binding affinity calculations. Specifically, a pairwise descreening approximation (Hawkins et al. Chem. Phys. Lett. 1995) is used in calculations of the electrostatic energy contribution. The algorithm also includes a procedure to account for the low dielectric region that might form between the ligand and the receptor during docking processes. It has been tested to obtain similar results compared with the grid-based free energy model but with much less computation efforts.

For more information on the scoring function and specifics of the implementation, see Liu et al. J. Phys. Chem. B. 2004.

Zou GB/SA Score Parameters

RETURN TO TABLE OF CONTENTS

2.10.7. Hawkins GB/SA Score

The Hawkins GB/SA score is an implementation of the Molecular Mechanics Generalized Born Surface Area (MM-GBSA) method originally described by (Srinivasan et al. J. Am. Chem. Soc. 1998) and reviewed by (Kollman et al. Acc. Chem. Res. 2000). This particular implementation of MM-GBSA uses the pairwise GB solvation model reported by Hawkins, Cramer and Truhlar (Hawkins et al. Chem. Phys. Lett. 1995 , Hawkins et al. J. Phys. Chem. 1996) with parameters described by Tsui and Case (Tsui, et al. Biopolymers 2001). Solvent Accessible Surface Areas (SA) are computed using a C++ implementation of the Amber8 "icosahedra" algorithm.

The total interaction between the ligand and receptor are represented by unscaled Coulombic and Lennard jones energy terms (MM) plus the change (delta) in solvation (GBSA) where deltaGBSA = GBSA complex - (GBSA receptor + GBSA ligand). For any given species GBSA = Gpolar ( GB energy) + Gnonpolar (SA*0.00542 + 0.92). Note that if inner and outer dielectric constants are set to 1 (gas-phase) and 80 (water-phase), then GBSA terms are formally equivalent to free energies of hydration that can be directly compared with experimental data and useful for evaluating the accuracy of different partial charge models (Rizzo et al. J. Chem. Theory. Comput. 2006).

The Hawkins GB/SA score was intended to be used for "single-point" calculations and the current implementation is quite slow if energy minimization is performed at the same time. However, an energy minimization is highly recommended prior to GBSA single-point calculations given that scores can be very sensitive to receptor-ligand geometry. Use of the -v flag will yield separate GBSA terms for each species (complex, receptor, and ligand) and also include raw icosahedra SA values in units of angstroms squared. This scoring method requires that the vdw_AMBER_parm99.defn file be specified which contains GB radii and scale parameters for each atom type.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Hawkins GB/SA Score Parameters

RETURN TO TABLE OF CONTENTS

2.10.8. PB/SA Score

The PB/SA scoring function is an application of the ZAP electrostatics toolkit from OpenEye. In order to employ the PB/SA score, one must have the OEChem and ZAP toolkits installed with a valid license. In particular, DOCK uses the APIs as in OEChem TK 1.7.0 and Zap TK 2.1.1. The toolkits are available to most academic and government institutions free of charge. Commercial use is available at a very reasonable cost from the nice folks at OpenEye. For more information on obtaining the toolkits, licensing, and other OpenEye products, go to the OpenEye web page at www.eyesopen.com.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From the OpenEye Documentation:

"A Smooth Permittivity Function for Poisson-Boltzmann Solvation Methods", J. Andrew Grant, Barry T. Pickup and Anthony Nicholls, J. Comp. Chem, Vol 22, No.6, pgs 608-640, April 2001.

ZAP is, at its heart, a Poisson-Boltzmann (PB) solver. The Poisson equation describes how electrostatic fields change in a medium of varying dielectric, such as an organic molecule in water. The Boltzmann bit adds in the effect of mobile charge, e.g. salt. PB is an effective way to simulate the effects of water in biological systems. It relies on a charge description of a molecule, the designation of low (molecular) and high (solvent) dielectric regions and a description of an ion-accessible volume and produces a grid of electrostatic potentials. From this, transfer energies between different solvents, binding energies, pka shifts, pI's, solvent forces, electrostatic descriptors, solvent dipole moments, surface potentials and dielectric focusing can all be calculated. As electrostatics is one of the two principal components of molecular interaction (the other, of course, being shape), ZAP is OpenEye's attempt to get it right.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

PB/SA Score Parameters

RETURN TO TABLE OF CONTENTS

2.10.9. AMBER Score

The AMBER score provides a suite of functionality including: AMBER molecular mechanics, implicit solvation, and molecular dynamics simulation, receptor flexibility, and conjugate gradient minimization. The main disadvantages are more complicated input preparation and increased computational expense.

2.10.9.1. AMBER Score Binding Energy

AMBER score implements molecular mechanics implicit solvent simulations with the traditional all-atom AMBER force field (Pearlman et al. Comp. Phys. Commun. 1995) for protein atoms and the general AMBER force field (GAFF, Wang et al. J. Comp. Chem. 2004) for ligand atoms. The interaction between the ligand and the receptor is represented by electrostatic and van der Waals energy terms, and the solvation energy is calculated using a Generalized Born (GB) solvation model. The user has the option to choose one of the following GB models: (i) Hawkins, Cramer and Truhlar pairwise GB model with parameters described by Tsui and Case (gb=1) ( Tsui et al. Biopolymers 2001), (ii) Onufriev, Bashford and Case model, GB(OBC) (gb=2) ( Onufriev et al. Proteins 2004), and (iii) a modified GB(OBC) (gb=5) ( Onufriev et al. Proteins 2004). The surface area term is derived using a fast LCPO algorithm ( Weiser et al. J. Comp Chem 1999).

The AMBER score is calculated as:
E(Complex) - [ E(Receptor) + E(Ligand) ],
where E(Complex), E(Receptor), and E(Ligand) are, respectively, the internal energies of the complex, receptor, and ligand (all solvated) as approximated by the AMBER force field with a GB/SA solvation term as referenced above.

The calculation of each of these three energies uses the same protocol: minimization with a conjugate gradient method is followed by MD simulation (Langevin molecular dynamics at constant temperature), another minimization, and a final energy evaluation. The user can specify the number of pre-MD-minimization cycles, the number of MD simulation steps, and the number of post-MD-minimization cycles in the dock input file. During the final energy evaluation, a surface area term is included. The receptor energy is determined once. The AMBER score energy protocol is performed for every ligand and its corresponding complex. The protocol that is performed with all default input parameters was developed in the work of Shivakumar et al., 2008.

2.10.9.2. AMBER Score Receptor Flexibility

AMBER score enables all or a part of the receptor to be flexible, in order to reproduce the so-called "induced-fit". The Cartesian coordinates of atoms that are flexible can be altered during the AMBER score energy protocol. The Cartesian coordinates of atoms that are not flexible cannot be altered during the AMBER score energy protocol. The flexible parts of the receptor-ligand complex are specified with a movable region Dock input file parameter. The movable region options are ligand, everything, nothing, distance, and NAB atom expression. For the ligand option, only the ligand is allowed to move during minimization and MD simulation. For the everything option, all the atoms in the receptor and the ligand are allowed to move. No minimization or MD simulation occurs for the nothing option. This is the only movable region option for which the ligand is not flexible during the AMBER score energy protocol. Consequently, close contacts can produce very large energies because the AMBER force field can be sensitive to the receptor-ligand geometry. However, the nothing option is the fastest type of AMBER scoring.

The distance movable region option selects residues that are allowed to move by receptor-ligand distance. If any atom in a receptor residue is within amber_score_movable_distance_cutoff angstroms of the ligand then the whole residue is selected. The ligand is represented by the active site sphere list, and thus the movable receptor residues are well defined and independent of any particular ligand molecule. The ligand is always movable. The selected residues are emitted with the -v verbose flag as a NAB atom expression.

The NAB atom expression movable region option is based on the program Nucleic Acid Builder (NAB). Every atom in a NAB molecule has a unique name. This name is composed of the strand name, the residue number and the atom name. A NAB atom expression is a character string that contains one or more patterns that match a set of atom names in a molecule. The two Dock input file parameters associated with this option specify the sets of atoms in the receptor and in the complex that are movable. To date our use of of this option has always specified that the set of complex movable atoms is the set of receptor movable atoms plus all atoms of the ligand; however, the two input parameters allow disparate usage. In DOCK a strand name is in fact a strand sequence number, and in a receptor-ligand complex, the ligand is always first in the sequence. Thus, a receptor molecule with two strands has strand names of "1" and "2"; in a corresponding receptor-ligand complex, the ligand (which is almost certainly single stranded) has strand name "1" and the receptor has strand names of "2" and "3".

NAB atom expressions contain three subexpressions separated by colons. They represent the strand, residue and atom parts of the atom expression. Not all three parts are required. An empty part selects all strands, residues or atoms depending on which parts are empty. Each subexpression consists of a comma separated list of patterns, or for the residue part, patterns and/or number ranges. Several atom expressions may be placed in a single character string by separating them with the vertical bar. Patterns in atom expressions are similar to Unix shell expressions. Each pattern is a sequence of one or more single character patterns and/or stars. The star matches zero or more occurrences of any single character. Each part of an atom expression is composed of a comma separated list of limited regular expressions, or in the case of the residue part, limited regular expressions and/or ranges. A range is a number or a pair of numbers separated by a dash. The NAB manual contains more information on atom expressions.

Here are some examples of NAB atom expressions:

  • :SER:
    #Select all atoms in any residue named SER. All three parts are present but both the strand and atom parts are empty. The atom expression :SER selects the same set of atoms.
  • ::C,CA,N,O
    #Select all atoms with names C, CA, N or O in all residues in all strands (typically the peptide backbone).
  • 1:1-10,13:CA,C,N
    #Select all atoms named CA,C,N in residues 1-10 and 13 in strand 1.
  • ::C*[^1]
    #The [^1] is an example of a negated character class. It matches any character in the last position except 1. In this case, it will match all the atoms starting with C, such as CA, CB, CG2, but not those ending with 1, such as CD1, CE1.
  • 2::|1:50,100:O*,N*
    #Select all atoms in strand 2. Select all atoms whose name starts with O and N in residue 50, 100 in strand 1. Note that the vertical bar separates the two strands
  • 4::|2::|1::
    #Select strand 4, 2 and 1.
  • :: or :
    #Select all atoms in the molecule.

2.10.9.3. AMBER Score Inputs

AMBER score requires many additional input files beyond those of other DOCK scoring functions. All the input files, such as the prmtop, frcmod, amber.pdb, etc. should be generated prior to running the DOCK AMBER score. Two perl scripts, prepare_amber.pl and prepare_rna_amber.pl, have been provided for this purpose. The script prepare_rna_amber.pl differs from prepare_amber.pl in that the former treats all nucleic acids as RNA whereas the latter treats them as DNA; in addition, prepare_rna_amber.pl supports the RNA specific features to neutralize to a total charge of zero and to solvate with water. The usage of these is, e.g.,
prepare_amber.pl ligand_mol2_file receptor_PDB_file
For example, if lig.mol2 is the name of the file that contains the ligand in mol2 format and rec.pdb is the name of the file that contains the receptor in PDB format then enter: prepare_amber.pl lig.mol2 rec.pdb
These scripts can process a mol2 file containing multiple ligands (usually the output from a previous DOCK run). See the section Amber Score Preparation Scripts and the tutorials for more information on how to use these scripts to generate the input files.

The prepare amber perl scripts call other scripts and programs, such as antechamber to calculate the AM1-BCC charges for the ligands, and tLEaP to assign the parm94 parameter set for protein atoms and the GAFF parameter set for ligand atoms. A metal ions library (parameters/leap/cmd/leaprc.dock.ions), a cofactor library (parameters/leap/cmd/leaprc.dock.cofactors), and a hook for a user library (parameters/leap/cmd/leaprc.dock.user), are automatically processed by tleap. The latter library provides an override mechanism for the global handling of common problems such as missing force field parameters. Detection of parmchk "ATTN, needs revision" messages for missing force field parameters and a local recovery mechanism for the frcmod files exists. The recovery process is described in the warning message and involves creating a file with the extension ".frcmod.revised file". With knowledge of and experience using AMBER one could create the input files manually; see the amberize scripts for the gory details. Note that the ligand_atom_file, which has the extension ".amber_score.mol2", must have a TRIPOS AMBER_SCORE_ID section for every ligand.

2.10.9.4. AMBER Score Outputs

The AMBER score output in the DOCK output contains energy terms for each species (complex, receptor, and ligand) such that the sum of the terms is equal to the score; in particular, note that the receptor and ligand values have been negated. Use of the DOCK -v verbose flag will produce detailed energy breakdowns; the formats and definitions of this information are discussed in the the NAB manual: Nucleic Acid Builder (NAB). The verbose output is best captured via UNIX shell file redirection as opposed to the DOCK -o flag.

The AMBER score output in the ligand_outfile_prefix_scored.mol2 file contains the ligand coordinates extracted from the final structure of the complex (i.e., the structure after the AMBER score energy protocol is performed). The ligand charges are from the ligand prmtop file. The ligand atom types are the normal ones from DOCK and are not the GAFF atom types used by AMBER score.

The AMBER score uses the following method for emitting all the final structures in their entirety. The AMBER score will create a file with the extension ".final_pose.amber.pdb" for every species (receptor, ligands, and complexes) that contains any movable region (regardless of whether that region actually exists or actually does move). Thus, for the distance, everything, and NAB atom expression options of the movable region, a file for the receptor (amber_score_receptor_file_prefix.final_pose.amber.pdb) and two for each ligand,complex pair (AMBER_SCORE_ID.final_pose.amber.pdb, amber_score_receptor_file_prefix.AMBER_SCORE_ID.final_pose.amber.pdb) will be created; for the nothing option of the movable region, no files will be created. In addition, if the -v verbose flag is present then the AMBER score will create an AMBER restart file with the extension ".final_pose.amber.restart" for every species (receptor, ligands, and complexes) that contains any movable region (regardless of whether that region actually exists or actually does move).

2.10.9.5. AMBER Score in Practice

Here is some general advice on the AMBER score. Since it is slower and more complicated than the other scoring functions, one should proceed carefully when using the AMBER score as the sole primary score. The most obvious use is to rescore, with the default amber_score input parameters, the ligands that have been already scored using one of the faster scoring functions. In fact, due to current software constraints it is not possible to use AMBER score as the primary score when orient_ligand=yes. In general, the input preparation for AMBER score is more involved than for the other scores. Effectively, one needs to generate input files for AMBER. In particular, examine the amberize*.out and *.log files for warnings and errors. (Note that the environment variable AMBERHOME should be undefined when using AMBER score.) Experience using AMBER will help in understanding and judging the impact of the messages in these files. Correcting problems may involve substantial effort. See the DOCK Fans mailing list for specific examples.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

AMBER Score Parameters

RETURN TO TABLE OF CONTENTS

2.10.10. Descriptor Score

The Descriptor score is a scoring function that calculates intermolecular hydrogen bonds, and footprint comparisons, in addition to standard intermolecular energies (VDW and ES).

Intermolecular Energy

Intermolecular Energies (VDW, ES) are calculated the same way as in Continuous Score.

Hydrogen Bonds

A geometric definition of Hydrogen bonds is employed. We define 3 atoms XD, HD, and XA as the heavy atom donor, donated hydrogen, and heavy atom acceptor, respectively. There is a hydrogen bond present if the following two conditions are met:

(1) The distance between HD and XA is less than or equal to 2.5 angstroms;

(2) The angle defined by XD, HD, and XA is between 120 and 180 degrees.

Footprints

Footprints are a per-residue decomposition of interactions between the ligand and the receptor. This can be performed for all three terms VDW, ES, HB. See Balius et al. The method was validated using a testset of 780 structures. See Mukherjee et al.

Footprints Similarity

Two footprints can be compared in three ways: Standard Euclidean, Normalized Euclidean, Pearson Correlation.

Footprints are used to gauge how similar two poses or two molecules are to one-another. For applications to virtual screening applications a reference is required.

There are to choices for a reference:

(1) One can give a mol2 file containing a reference molecule, and footprints will be calculated.

(2) One can pass a text file containing VDW, ES, and H-bond footprints. The following is an example of format for the footprint reference text file:

####################################
###  Reference FP 
####################################

 resname   resid     vdw_ref      es_ref  hb_ref
    GLY1       1   -0.000045   -0.000061       0
    GLU2       2   -0.000226    0.000032       0
    ALA3       3   -0.000174   -0.000093       0
    PRO4       4   -0.000266    0.000186       0
    ASN5       5   -0.001091    0.000538       0
    GLN6       6   -0.000612    0.000165       0
    ALA7       7   -0.001325   -0.000591       0
    LEU8       8   -0.002368    0.000552       0
    LEU9       9   -0.007224    0.000388       0
   ARG10      10   -0.007143   -0.008906       0
   ILE11      11   -0.004132    0.000615       0
   LEU12      12   -0.012233   -0.000404       0
   LYS13      13   -0.002511   -0.003366       0
   GLU14      14   -0.004349    0.005031       0
   THR15      15   -0.001461    0.000304       0
 

Residue selections

There are different choices for selection of residues:

(1) All residues.

(2) Residues chosen using a threshold (union of the sets of reference and pose). The VDW, ES, and HB footprints may have different residues chosen in this case.

(3) Selected residues.

Note that for (2) and (3) the remaining residue interaction may be placed in a remainder value included in the footprint.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Descriptor Score Parameters

RETURN TO TABLE OF CONTENTS

2.10.11. MultiGrid FPS Score

The MultiGrid FPS Score is similar to the Descriptor Score described in the previous section, except that in the case of the Multi-Grid FPS Score, the pair-wise interaction energies are computed over multiple grids rather than in Cartesian space. Generally, "important" receptor residues are identified before-hand based on the magnitude of their interaction with the reference ligand, then a unique grid is generated to represent each of those residues. Finally, a "remainder" grid is generated to represent all remaining receptor residues. The scoring function itself will then calculate intermolecular VDW and ES energies for the reference ligand and pose ligand on each of the grids (also called footprints), then it will calculate the footprint similarity using either the Standard Euclidean, Normalized Euclidean, or Pearson Correlation similarity metrics as described above.

To aid in the selection of important residues and automate the creation of grids, a wrapper python script has been included called "multigrid_fp_gen.py".

Grid Generation

Here is an example script for generating the grids:

#############################################################################
dock6 -i important_residues.in -o important_residues.out
multigrid_fp_gen.py rec.mol2 resid grid.in `grep -A 1 range_union important_residues.out | sed -e'/range_union/d' -e'/--/d' -e's/ //g' | tr ',' '\n' | sort -nu`
#############################################################################

This script can be found in the test suite (install/test/fp_multigrid_generation/getrange_mkgrids). In this script dock is run using Descriptor Score to obtain the important residues, and then the dock output file is processed with grep, sed, and sort to give the range in the format needed by the python grid generation wrapper script.

Several files are needed including a box.pdb file and a grid input file to generate the grids, for example:

#############################################################################
compute_grids yes
grid_spacing 0.4
output_molecule yes
contact_score no
chemical_score no
energy_score yes
energy_cutoff_distance 999
atom_model a
attractive_exponent 6
repulsive_exponent 9
distance_dielectric yes
dielectric_factor 4
bump_filter yes
bump_overlap 0.75
receptor_file ./temp.mol2
box_file ./box.pdb
vdw_definition_file ../../../parameters/vdw_AMBER_parm99.defn
score_grid_prefix ./temp.rec
receptor_out_file ./temp.rec.grid.mol2
#############################################################################

This grid program produces the temp grid files which are then renamed by the python script.

MultiGrid FPS Score Parameters

RETURN TO TABLE OF CONTENTS

2.10.12. SASA Score Score

2.10.12. SASA Score Score

The SASA score provides the user the ability to calculate the percent exposure of a ligand. In addition to the percent exposure, the code calculates the percentage of the hydrophobic portion of the ligand and the receptor that is buried in the pocket.

The surface area is calculated using the same implementation as the Hawkins GB/SA.

Note: This is used for post-processing and should be used as a single-point calculation only.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

SASA Score Parameters

RETURN TO TABLE OF CONTENTS

2.11. Minimization

Score optimization allows the orientation and conformation of a molecule to be adjusted to improve the score. Although the calculations can be expensive, the orientational and conformational searches may become more efficient because less sampling is necessary. The set of optimization features is controlled by the minimize_ligand parameter. Three types of minimization, which are discussed below, are available: standard minimization, pre-minimization, and restrained minimization. The optimizer uses the downhill simplex algorithm, which does not require evaluation of derivatives (Nelder et al. Computer Journal, 1964). However, it does depend on a random number generator which can be sensitive to the initial seed in the simplex_random_seed parameter. The amount of variance should be small, but for detailed calculations, it is recommended that the optimization be repeated with different random number seeds to check convergence.

The initial step size of the minimizer is specified with the initial translation XXX_trans_step, initial rotation XXX_rot_step, and initial torsion XXX_tors_step parameters, where XXX is defined below. XXX_trans_step has DOCK length units. The units of XXX_rot_step are π radians; this is a natural unit; in other words, the constant π is normalized to unity, and the default values of 0.1 for XXX_rot_step are 0.1 π radians = 18 degrees; in addition, the range of values is -1.0 to +1.0; input values outside that range are wrapped around into that range. The units of XXX_tors_step are degrees.

Users can choose to minimize the rigid anchors and minimize during flexible growth. Note that the simplex_final_min (minimize the final conformation) option has been removed starting with version 6.4. The recommendation now is to use the internal energy function; it is included as part of the minimization routine at each stage of growth. If a user wishes to do a final step of minimization, DOCK can be run an additional time (turn off orienting and flexible growth, turn on minimization). The anchor minimization is always done rigidly; also, if no flexible growth is being done, this step will minimize the entire molecule. The minimization during the flexible growth is a complete (torsions + rigid) minimization.

The XXX_max_iterations parameters specify the convergence criteria in number of iterations inside one call of the simplex minimizer; the XXX_score_converge parameters specify the other convergence criteria in terms of the score inside one call of the simplex minimizer. Thus, when the simplex shrinks enough so that the highest and lowest points are within the scoring energy tolerance or when the maximum number of iterations is reached, one call to the simplex minimizer terminates. Because of convergence issues with the downhill simplex method, additional calls of the simplex minimizer inside one step of minimization are possible. These convergence criteria are XXX_max_cycles for the maximum number of calls, i.e., cycles, of the simplex minimizer and XXX_cycle_converge for the normalized vector distance that the simplex has moved. For further details see page 416 of Ewing et al. 2001 and note that the citation numbers are off by one: i.e., [24] should be [23].

Here is a discussion of the number of iterations for these two types of minimization: standard minimization, which explores all (N + 6) degrees of freedom, and pre-minimization, which explores only the torsional ( N ) degrees of freedom. The additional 6 degrees are the translational (3) and rotational (3) ones. Pre-minimization is controlled, depending on the value of flexible_ligand, with one of these parameters: simplex_tors_premin_iterations or simplex_grow_tors_premin_iterations . Note that if 500 iterations of standard minimization and 20 iterations of torsion pre-minimization are specified, at most 520 steps of minimization will be performed. The motivation for using the torsion pre-minimizer is to optimize the torsions before translating. This parameter should only be used in cases where the growth tree reveals that the minimizer is translating a correctly oriented scaffold to relieve clashes instead of adjusting the torsions first. Alternatively, the restraint minimization routine could be used (see below). Although the default number of minimization steps is set to 500, this may be unnecessary. Results that are almost as good can be achieved with 20 steps of simplex_grow_max_iterations with much faster run time.

A third type of minimization is restrained minimization. Restrained minimization uses an RMSD tether: The minimizer may be performing too much sampling by moving partially grown conformers too far from the initial starting point. The minimizer should not be a coarse sampling tool. The objective of this routine is to ensure that the minimizer does NOT redundantly sample conformations which should be accessed using the anchor and grow code. The philosophy motivating the development of restrained minimization is to avoid the loss of information gained in the initial anchor placement and the other preceding growth steps. Conformations which cannot be rescued without moving "far away" should be pruned instead of being moved into the conformational space of an adjacent branch of growth.

RMSD and RMSD2 are metrics for the distances between two molecules. RMSD2 can be thought of as the mean squared distance.

where the variables ai and bi are the Cartesian coordinates of the same atom from two conformers of the same molecule. An intuitive way to restrain a (partial or fully grown) conformer is to use Hooke's law:

Erestraint= k RMSD2

where the parameter k is in kcal/(mol 2). Simplex minimization energy evaluation is the sum (Score + Internal Energy+ Erestraint). A typical value of k=10 kcal/(mol 2), for small conformational changes where say RMSD =0.5 , yields Econstraint =10*(0.5)^2=2.5 kcal/mol. This indicates that the constraint will affect the minimization very little unless the ligand is moved a lot.

RETURN TO TABLE OF CONTENTS

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Score Optimization Parameters

Generic Minimizer Parameters

  1. if flexible_ligand = no OR use_advanced_simplex_parameters = no, XXX = simplex
  2. if use_advanced_simplex_parameters = no AND minimize_anchor = yes, XXX = simplex_anchor
  3. if use_advanced_simplex_parameters = no AND minimize_flexible_growth = yes, XXX = simplex_grow
  4. if simplex_secondary_minimize_pose AND use_advanced_secondary_simplex_parameters, XXX = simplex_secondary

RETURN TO TABLE OF CONTENTS

2.12. Parameter Files

The parameter files contain atom and bond data needed during DOCK calculations. The definition (*. defn ) files contain atom and bond labeling data. The table (*. tbl ) files contain additional data for chemical interactions and flexible bond torsion positions. They may be edited by the user. All the parameter files described below can be found in the "parameter" directory in the DOCK distribution.

RETURN TO TABLE OF CONTENTS

2.12.1. Atom Definition Rules

The definition files use a consistent atom labeling convention for which any atom in virtually any chemical environment can be identified. The specification of adjacent atoms is nested using the elements listed below:

Atom Definition Elements

Element

Function

atom type

Specifies partial or complete atom type. A partial specification is more general (i.e., "C" versus "C.3"). An asterisk (*) specifies ANY atom type.

( )

Specifies atoms that must be bonded to parent atom.

[ ]

Specifies atoms that must NOT be bonded to parent atom.

integer

Specifies the number of an atom that must be bonded.

Example Definitions

Example

Explanation

C.2 ( 2 O.co2 )

A carboxylate carbon.

.3 [ 3 H ]

Any sp3 hybridized atom that is not attached to three hydrogens.

C. [ O . ] [ N . [ 2 O.2 ] [ 2 C. ] ]

Any carbon not attached to an oxygen or a nitrogen (unless the nitrogen is a nitro or tertiary nitrogen).

RETURN TO TABLE OF CONTENTS

2.12.2. vdw.defn

This file contains atom labels and definitions for van der Waals (VDW) atom typing. The following data types are associated with each atom: atom model, VDW radius, VDW well-depth, flag for heavy atom, and valence. The atom model specifies one of three possible values: all, united, or either; these indicate an all-atom model, a united-atom model, or either all- or united-atom models. The VDW radius and VDW well-depth values are used in molecular mechanics-scoring functions (see Grid). The valence is the value for the maximum number of atoms that can be attached. The definition is the Sybyl atom types that should be associated with the atom name. A label may have multiple definitions.

In the vdw_AMBER_parm99.defn file, there are additional parameters needed for use with the Hawkins GB/SA scoring function (see Hawkins GB/SA). Each entry has an additional gbradii and gbscale parameter.

WARNING: The last entry in the vdw.defn file MUST be Dummy.

Sample Entries

_____________________________________
name Carbon_sp/sp2
atom_model either
radius 1.850
well_depth 0.120
heavy_flag 1
valence 4

definition C
_____________________________________
name Carbon_All_sp3
atom_model all
radius 1.800
well_depth 0.060
heavy_flag 1
valence 4
gbradii 1.70
gbscale 0.72

definition C.3
_____________________________________
name Carbon_United_CH3
atom_model united
radius 2.000
well_depth 0.150
heavy_flag 1
valence 4

definition C. ( 3 H )
_____________________________________

RETURN TO TABLE OF CONTENTS

2.12.3. chem.defn

This file contains labels and definitions for chemical labeling. Nothing outside of addition to a label needs to be assigned to an atom. A label may have multiple definition lines but the names must match the rules in the chem_match.tbl file.

Sample Entries

________________________________________________________
name hydrophobic
definition C. [ O. ] [ N . [ 2 O.2 ] [ 2 C. ] ] ( * )
definition N.pl3 ( 3 C. )
definition Cl ( C. )
definition Br ( C. )
definition I ( C. )
definition C.3 [ * ]
________________________________________________________
name donor
definition N. ( H )
definition N.4 [ * ]
________________________________________________________
name acceptor
definition O. [ H ] [ N. ] ( * )
definition O.3 ( 1 * ) [ N. ]
definition O.co2 ( C.2 ( O.co2 ) )
definition N. [ H ] [ N. ] [ O . ] [ 3 . ] ( * )
definition O.2 [ * ]
________________________________________________________

RETURN TO TABLE OF CONTENTS

2.12.4. chem_match.tbl

This file contains the interaction matrix for which chemical labels can form an interaction in matching. The labels must be identical to labels in chem.defn. The table flag indicates the beginning of the interaction table. Compatible labels are identified with a one, otherwise a zero.

Sample File

label null
label hydrophobic
label donor
label acceptor
label polar
table
1
1 1
1 0 1
1 0 0 1
1 0 1 1 1

RETURN TO TABLE OF CONTENTS

2.12.5. flex.defn

This file contains labels and definitions for flexible bond identification. The drive_id field corresponds to a torsion type in the flex_drive.tbl file. The minimize field is a flag for whether the bond may be minimized. Two definition lines must be present. Each definition corresponds to an atom at either end of the bond.

Sample Entries

______________________________________
name sp3-sp3
drive_id 3
minimize 1
definition .3 [ 3 H ] [ 3 O.co2 ]
definition .3 [ 3 H ] [ 3 O.co2 ]
______________________________________
name sp3-sp2
drive_id 4
minimize 1
definition .3 [ 3 H ] [ 3 O.co2 ]
definition .2 [ 2 H ] [ 2 O.co2 ]
______________________________________
name sp2-sp2
drive_id 2
minimize 0
definition .2 [ 2 H ] [ 2 O.co2 ]
definition .2 [ 2 H ] [ 2 O.co2 ]
______________________________________

RETURN TO TABLE OF CONTENTS

2.12.7. flex_drive.tbl

This file contains torsion positions assigned to each rotatable bond when the flexible docking parameter is used in DOCK. The drive_id field corresponds to each torsion type in the flex.defn file. The positions field specifies the number of torsion angles to sample. The torsions field specifies the angles that are sampled.

Sample Entries

______________________________________
drive_id 2
positions 2
torsions 0 180
______________________________________
drive_id 3
positions 3
torsions -60 60 180
______________________________________
drive_id 4
positions 4
torsions -90 0 90 180
______________________________________

RETURN TO TABLE OF CONTENTS

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Atom Typing Parameters

RETURN TO TABLE OF CONTENTS

2.13. Ligand File Output

If secondary scoring is enabled, all of the above files will be generated for both primary and secondary scoring. The secondary scoring files will contain both the original primary score as well as the new secondary score.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Ligand File Output

RETURN TO TABLE OF CONTENTS

2.14. Parallel Processing

Parallel processing is fully integrated into the DOCK program. A master-slave paradigm is employed. The single master processor performs file input and output, whereas the slave processors perform the actual calculations. Consequently, a minimal difference in performance between 1 and 2 processors will exist under normal file handling conditions. Improved performance will only become evident with more than 2 processors. If the number of processors is 1 then the code defaults to the non-MPI behavior. It should be emphasized that the primary benefit in using DOCK6 in parallel mode is to reduce bookkeeping tasks associated with manually splitting up a database into multiple chunks which then must be submitted to different processors individually. DOCK6 automatically partitions out subsets of a database to various processors, collates and ranks the final results, and does all intermediate bookkeeping.

The performance of parallel DOCK has been studied (Moustakas et al., 2006 and Peters et al., 2008). The parallel efficiency is less than but close to 100% even for high processor counts. Sorting the ligand database input file by the number of atoms per ligand or by the number of rotatable bonds per ligand improves load balancing.

Since DOCK is only parallelized over ligands, with one ligand in the ligand_atom_file running in parallel is the same as running serially. In addition, the maximum productive number of processors is equal to the number of ligands plus one. The general advice for parallel DOCKing is to start with serial DOCK on a small representative set of ligands. Once one has a useful serial calculation then one can confidently run a large library of ligands in parallel.

Parallel jobs simply require that the user specify the location of the MPI installation used to build DOCK6, a machine file listing the names of the nodes, and the total number of processors to be used. For parallel docking three additional parameters must be specified: (1) the path of the MPICH2 installation (i.e /usr/local/mpich2-1.0.4/bin/mpirun), (2) a machine file containing the names of the computers (nodes) to be used, and (3) the number of processors which typically is the same as the number of lines in the machine file. See Commandline-Arguments for usage information on Parallel DOCK.


Accessories

RETURN TO TABLE OF CONTENTS

3.1. Grid

AUTHOR: Todd Ewing (based on work by Elaine Meng and Brian Shoichet)

USAGE: grid -i grid.in [-stv] [-o grid.out]

OPTIONS:
- i input_file #Input parameters extracted from input_file, or grid.in if not specified
-o output_file #Output written to output_file, or grid.out if not specified
-s Input parameters entered interactively
-t Reduced output level
-v Increased output level

DESCRIPTION:

3.1.1. Overview

Grid creates the grid files necessary for rapid score evaluation in DOCK. Two types of scoring are available: contact and energy scoring. The scoring grids are stored in binary files ending in *.cnt and *.nrg respectively. When docking, each scoring function is applied independent of the others and the results are written to separate output files. Grid also computes a bump grid which identifies whether a ligand atom is in severe steric overlap with a receptor atom. The bump grid is identified with a *.bmp file extension. The binary file containing the bump grid also stores the size, position and grid spacing of all the grids.

The grid calculation must be performed prior to docking. The calculation can take up to 45 minutes, but needs to be done only once for each receptor site. Grid generation and docking should be performed on the same platform; because the grid files have a binary format, copying grid files between machines may produce incorrect results. Furthermore, binary files should not be modified with text editors. Since DOCK can perform continuum scoring without a grid, the grid calculation is not always required. However, for most docking tasks, such as when multiple binding modes for a molecule or multiple molecules are considered, it will become more time efficient to precompute the scoring grids.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

General Grid Parameters

RETURN TO TABLE OF CONTENTS

3.1.2. Bump Checking

Prior to scoring, each orientation can be processed with the bump filter to reject ones that penetrate deep into the receptor. Orientations that pass the bump filter are then scored and/or minimized with any of the available scoring functions. A bump is based on the sum of the van der Waals radii of the two interacting atoms. The user specifies what fraction of the sum is considered a bump. For example, the default definition of a bump is if any two atoms approach closer than 0.75 of the sum of their radii. Grid stores an atomic radius which corresponds to smallest radius of ligand atom at the grid position which would still trigger a bump. During docking, for a given orientation, the position of each atom is checked with the bump grid. If the radius of the atom is greater than or equal to the radius stored in the bump grid, then the atom triggers a bump. To conserve disk space, the atom radius is multiplied by 10 and converted to a short unsigned integer.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Bump Grid Parameters

RETURN TO TABLE OF CONTENTS

3.1.3. Contact Scoring

Contact scoring in grid incorporates the scoring performed with the DISTMAP program (developed by Shoichet and Bodian). The score is a summation of the heavy atom contacts (every atom except hydrogen) between the ligand and receptor. A contact is defined as an approach of two atoms within some cutoff distance (usually 4.5 angstroms). If the two atoms approach close enough to bump (as identified with the bump grid) then the interaction can be penalized by an amount specified by the user.

Distance dependence of contact score function

The attractive score in grid is negative and a repulsive score is positive. This switch of sign is necessary to allow the same minimization protocol to be used for contact scoring as implemented for energy scoring.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Contact Grid Parameters

RETURN TO TABLE OF CONTENTS

3.1.4. Energy scoring

The energy scoring component of DOCK is based on the implementation of force field scoring. Force field scores are approximate molecular mechanics interaction energies, consisting of van der Waals and electrostatic components:

where each term is a double sum over ligand atoms i and receptor atoms j, which include the quantities listed below.

Generalization of the VDW component

The van der Waals component of the scoring function has been generalized to handle any combination of repulsive and attractive exponents (providing that a> b). The user may choose to "soften" the potential by using a 6-9 Lennard -Jones function. The general form of the van der Waals interaction between two identical atoms is presented:

where e is the well depth of the interaction energy, R is the van der Waals radius of the atoms, and coefficients C and D can be determined given the two following boundary conditions:

at

at

Application of these boundary conditions to the above equation yields an expression of the van der Waals interaction with a generalized Lennard -Jones potential.

The consequence of using a different exponent for the repulsive term is illustrated in Figure 1. Notice that the well position and depth are unchanged, but that the repulsive barrier has shrunk by about 0.25 angstrom.

Distance dependence of the Lennard -Jones Function

Precomputing potentials on a grid

By inspection of the above equations , the repulsion and attraction parameters ( Aij and Bij ) for the interactions of identical atoms can be derived from the van der Waals radius, R, and the well depth, e.

In order to evaluate the interaction energy quickly, the van der Waals and electrostatic potentials are precomputed for the receptor and stored on a grid of points containing the docking site. Precomputing the van der Waals potential requires the use of a geometric mean approximation for the A and B terms, as shown:

Using this approximation, the first equation can be rewritten:

Three values are stored for every grid point k ,each a sum over receptor atoms that are within a user defined cutoff distance of the point:

These values, with trilinear interpolation, are multiplied by the appropriate ligand values to give the interaction energy. Grid calculates the grid values and stores them in files. The values are read in during a DOCK run and used for force field scoring.

The user determines the location and dimensions of the grid box using the program showbox. It is not necessary for the whole receptor to be enclosed; only the regions where ligand atoms may be placed need to be included. The box merely delimits the space where grid points are located, and does not cause receptor atoms to be excluded from the calculation. Besides a direct specification of coordinates, there is an option to center the grid at a sphere cluster center of mass. Any combination of spacing and x, y, and z extents may be used.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

Energy Grid Parameters

u = United atom model. Hydrogens attached to carbons are assigned a zero VDW well-depth and the partial charge is transferred to the carbon.
a = All atom model. Hydrogens attached to carbons have regular VDW well-depth and partial charge is not modified.

RETURN TO TABLE OF CONTENTS

3.2. Docktools

Docktools is a suite of programs that are available to create grids used by Dock3.5 score function in DOCK 6 which is an implementation of the scoring function available in the older version of DOCK, i.e., dock3.5.54. These programs can be used to generate steric, electrostatic and ligand and receptor desolvation grids. Dock3.5 scoring functionality in DOCK 6 is an alternate scoring approach to electrostatic interaction energy and also includes different grid-based methods for calculating ligand and receptor desolvation. Docktools consists of chemgrid, solvmap, solvgrid, grid-convert and grid-convrds.  This section will briefly review the theory behind each of the programs and the describe the usage.

3.2.1. CHEMGRID

AUTHOR: Brian K. Shoichet

DESCRIPTION:

This program chemgrid produces values for computing force field scores and bump checking.  The force field scores, or molecular mechanics interaction energies are calculated as van der Waals and electrostatic components and stored on grids. The calculations are based on the theory presented in the Grid section above. However only the steric interaction energy grids are used in DOCK 6 as a part of Dock3.5 Score. The electrostatic interaction calculation differs from Energy Scoring in the following aspects. For calculating the electrostatic interaction, the electrostatic potential of the receptor calculated using the linearized form of Poisson-Boltzmann equation: 

npbe_score_eqn

Where phi(x) the potential is determined by the dielectric function epsilon(x), a modified Debye-Huckel parameter kappa(x), and the charge density of the receptor rho(x). The electrostatic potential map (or phimap) is calculated using DelPhi and then is used by DOCK 6 to calculate the electrostatic interaction energy as:

delphi_elec_score
Where, qi the partial charge of every atom i, is multiplied by the electrostatic potential of the receptor phi at the respective atomic position. Trilinear interpolation method is used for interpolating the electrostatic potential from the phimap on to the ligand position.

USAGE: chemgrid

INPUT FILE:

This programs require that an INCHEM file be created in the working directory, which contains the parameters to control the program. The INCHEM parameters for chemgrid are detailed below:

receptor.pdb ; receptor pdb file
parameters/ prot.table.ambcrg.ambH ; charge parameter file (for alternate naming conventions see also prot.table.ambcrg.pdb3H, prot.table.ambcrg.pdbH, and prot.table.ambcrg.sybH)
parameters/ vdw.parms.amb.mindock ; VDW parameter file
box.pdb ; box pdb file
0.33 ; grid spacing in angstroms
1 ; es type: distance-dependent dielectric; 2: constant dielectric
1 ; es scale for ff scoring
2.3 2.8 ; bumping distances for polar and non-polar receptor atoms
output_prefix ; output grid prefix name

OUTPUT FILES

3.2.2 SOLVMAP

AUTHOR:  Brian K. Shoichet

DESCRIPTION:

The solvmap program calculates the grid that is used by DOCK6 for calculating ligand desolvation. Ligand desolvation is calculated as a sum of the atomic desolvation multiplied by a normalization factor that accounts for the extent to which the ligand atom is buried by the binding site. The atomic desolvation for each ligand atom can be calculated by AMSOL (AMSOL is not distributed by us, please follow the link for more information) and is stored in the input file (see file formats). The cost of desolvating each atom, or the normalization factor, is the distance weighted high dielectric volume displaced by the protein that is stored for each grid element in the active site. Thus the volume based ligand desolvation energy is calculated as:

                                                                Edesol_equation

Here L is the ligand atom desolvation, volume summed over k volume elements, V. This method is only an approximation to GB solvation and works within the limits of complete burial from the solvent and complete exposure to the solvent on the protein surface. However, being grid-based it is fast and can be used during conformational search and final scoring.

USAGE:  solvmap                                                              

INPUT FILE:

This programs require that an INSOLV file be created in the working directory, which contains the parameters to control the program. The INSOLV parameters for chemgrid are detailed below:

        receptor.pdb; receptor file
        solvmap ; output grid file
        1.4,1.3,1.7,2.2,2.2,1.8 ; radii of oxygen, nitrogen, carbon, sulfur, phosphorus, and "other" atoms.
        1.4 ;  radius of probe
        1 ; grid resolution
        box.pdb ; box file

OUTPUT FILES:

3.2.3 SOLVGRID

AUTHOR  Kaushik Raha

DESCRIPTION:

The solvgrid program creates bulk (or receptor) and explicit (or ligand) desolvation grids using the occupancy desolvation method (Luty et. al., J. Comp. Chem, 1995; Verkhiver et. al., J. Mol. Recog., 1999). The occupancy desolvation method is phenomenological in nature where desolvation energy can be described pairwise additively. The desolvation energy due to interaction between a ligand atom and receptor atom can be calculated as  the solvent affinity of a ligand atom weighted by the volume of the solvent displaced from the receptor atom due to binding and vice versa. Thus,

                                                Edsol  =  Si DES,EXPL(xi) +  fi DES,BULK(xi)

Where the mobile atom i, has a solvation affinity of Si and a fragmental volume of  fi. The solvent affinity of the ligand atom is calculated as:

                                                  lig_atm_sol

Where qi is the charge on the ligand atom i, and alpha and beta are constants  set at alpha = 0.25 kcal/mol and beta = -0.005 kcal/mol. fi is the volume of ligand atom i, calculated from the amber radius of the ligand atom. For the receptor, these can be precalculated and stored on a grid. DES,BULK(xi)  and DES,EXPL(xi) are interpolated from grids calculated using the solvgrid program during docking. It requires the calculation of two separate grids:

                                            blk_exp_dsol_grd

where j is a rigid receptor atom, and Sj and fj are the solvent affinity and the fragmental volume of the receptor atom respectively. Bulk and explicit desolvation grids are calculated from fj and Sj at grid points p, distance rjp from the receptor atom multiplied by gaussian weighting of the distance.  The solvgrid program calculates these grids from the charge and the volume of the receptor atoms.

USAGE: solvgrid

INPUT

This programs require that an INRDSL file be created in the working directory, which contains the parameters to control the program. The INRDSL parameters for solvgrid are detailed below:

receptor.pdb ; receptor pdb file
parameters/ prot.table.ambcrg.ambH ; charge parameter file< (for alternate naming conventions see also prot.table.ambcrg.pdb3H, prot.table.ambcrg.pdbH, and prot.table.ambcrg.sybH)
parameters/ vdw.parms.amb.mindock ; VDW parameter file
box.pdb ; box pdb file
0.33 ; grid spacing in angstroms
1 ; es type: distance-dependent dielectric; 2: constant dielectric
4 ; es scale for ff scoring
2.3 2.8 ; bumping distances for polar and non-polar receptor atoms
output_prefix ; output grid prefix name
sol_op ; method for calculating desolvation grid
solE_recep; solvation free energy of receptor

OUTPUT FILES



3.2.4 GRID-CONVERT

AUTHOR: Kaushik Raha, John J. Irwin (Derived from Todd Ewing's GRID Program)

DESCRIPTION:

This consists of programs grid-convert and grid-convrds that convert grids generated by chemgrid, solvgrid and DelPhi into DOCK6 readable grids. Note that grids created by program Grid are already DOCK6 readable grids. Thus, even though the input of grid-convert may contain parameters related to Grid produced grids, such grids do not need conversion - to wit the contact grid.

USAGE: grid-convert -i gconv.in >& gconv.out

INPUT FILES: 

  • gconv.in; input file with parameters
  • vdw.defn; vdw parameters file
  • conv.defn; atomtype definition file
  • RETURN TO TABLE OF CONTENTS

    3.3. Nchemgrids

    AUTHOR: Xiaoqin Zou

    USAGE: nchemgrid_GB or nchemgrid_SA

    INPUT FILE:

    Both programs require that an INCHEM file be created in the working directory, which contains the parameters to control the program. The INCHEM parameters for both the nchemgrid_GB and nchemgrid_SA programs are detailed below:

    For nchemgrid_GB:

    receptor.pdb ; receptor pdb file
    cavity.pdb ; cavity pdb file
    parameters/ prot.table.ambcrg.ambH ; charge parameter file (for alternate naming conventions see also prot.table.ambcrg.pdb3H, prot.table.ambcrg.pdbH, and prot.table.ambcrg.sybH)
    parameters/ vdw.parms.amb ; VDW parameter file
    box.pdb ; box pdb file
    0.4 ; grid spacing in angstroms
    2 ; es type: GB
    1 ; es scale for ff scoring
    8.0 8.0 ; cutoff for es and outer box
    78.3 78.3 ; dielectric of solvent ,cavity
    2.3 2.8 ; bumping distances
    output_prefix ; output grid prefix name
    1 ; pairwise calculation

    NOTE: The cavity.pdb file should be an empty file. This feature is not frequently used. However, the parameter must still be passed. The pairwise calculation value must also always be 1.

    For nchemgrid_SA:

    receptor.pdb ; receptor pdb file
    parameters/prot.table.ambcrg.ambH ; charge parameter file (for alternate naming conventions see also prot.table.ambcrg.pdb3H, prot.table.ambcrg.pdbH, and prot.table.ambcrg.sybH)
    parameters/ vdw.parms.amb ; VDW parameter file
    box.pdb ; box pdb file
    0.4 ; grid spacing in angstroms
    1.4 ; probe radius for SA
    2 ; scoring type: SA
    8.0 ; cutoff for SA calculations
    output_prefix ; output grid prefix name

    DESCRIPTION:

    The nchemgrid_gb and nchemgrid_sa programs create the GB and SA receptor grids for use with the Zou GB/SA scoring function (see Zou GB/SA Scoring).

    OUTPUT FILES

    For nchemgrid_gb

    For nchemgrid_sa:

    RETURN TO TABLE OF CONTENTS

    3.4. Sphgen

    AUTHOR: Irwin D. Kuntz (modified by Renee DesJarlais and Brian Shoichet)

    USAGE: sphgen

    INPUT FILE*:
    rec.ms #molecular surface file
    R #sphere outside of surface (R) or inside surface (L)
    X #specifies subset of surface points to be used (X=all points)
    0.0 #prevents generation of large spheres with close surface contacts (default=0.0)
    4.0 #maximum sphere radius in angstroms (default=4.0)
    1.4 #minimum sphere radius in angstroms (default=radius of probe)
    rec.sph #clustered spheres file

    *WARNINGS:
    (1) The input file names and parameters are read from a file called INSPH, which should not contain any blank lines or the comments (denoted by # ) from above.
    (2) The molecular surface file must include surface normals. Sphgen expects the Fortran format

    A3, I5, X, A4, X, 2F8.3, F9.3, X, A3, 7X, 3F7.3

    DESCRIPTION:

    3.4.1. Overview

    Sphgen generates sets of overlapping spheres to describe the shape of a molecule or molecular surface. For receptors, a negative image of the surface invaginations is created; for a ligand , the program creates a positive image of the entire molecule. Spheres are constructed using the molecular surface described by Richards (1977) calculated with the program dms . Each sphere touches the molecular surface at two points and has its radius along the surface normal of one of the points. For the receptor, each sphere center is outside the surface, and lies in the direction of a surface normal vector. For a ligand, each sphere center is inside the surface, and lies in the direction of a reversed surface normal vector.

    Spheres are calculated over the entire surface, producing approximately one sphere per surface point. This very dense representation is then filtered to keep only the largest sphere associated with each receptor surface atom. The filtered set is then clustered on the basis of radial overlap between the spheres using a single linkage algorithm. This creates a negative image of the receptor surface, where each invagination is characterized by a set of overlapping spheres. These sets, or clusters, are sorted according to numbers of constituent spheres, and written out in order of descending size. The largest cluster is typically the ligand binding site of the receptor molecule. The program showsphere writes out sphere center coordinates in PDB format and may be helpful for visualizing the clusters (see showsphere).

    RETURN TO TABLE OF CONTENTS

    3.4.2. Critical Points

    The process of labeling site points for critical matching must currently be done by hand (see Critical Points for use in DOCK). The user should load the site points and the receptor coordinates into a graphic program to determine the spheres closest to the target area. Once a sphere or group of spheres has been determined to be critical, the sphere(s) should be labeled by changing the second to last column of the final sphere file to the critical cluster number (see Output).

    RETURN TO TABLE OF CONTENTS

    3.3.3. Chemical Matching

    The process of labeling site points for chemical matching must also be done by hand (see Chemical Matching for use in DOCK). The user should load the site points and the receptor coordinates into a graphic program and study the local environment of each point. Labeled site points may be input as either a SPH format or SYBYL MOL2 format coordinate file. To store labeled site points in a MOL2 file, select an atom type for each label of interest. Then edit the chem.defn file to include the selected atom types (see chem.defn). Site point definitions can be distinguished from ligand atom definitions by explicitly requiring that no bonded atoms can be attached (ie. followed by [*]). Using the convention in that example file, site points should be labeled as follows: hydrophobic, "C.3"; donor, "N.4"; acceptor, "O.2"; polar, "F".

    Example of chemical labels in SPH format

    DOCK 3.5 receptor_spheres
    color hydrophobic 1
    color acceptor 2
    color donor 3
    cluster 1 number of spheres in cluster 49
    7 2.34500 36.49000 16.93500 1.500 0 0 1
    8 -0.05200 42.29900 14.18800 1.500 0 0 1
    9 -0.67000 41.20600 11.59800 1.500 0 0 1
    17 -6.00000 34.00000 17.00000 1.500 0 0 3
    18 -5.00000 29.00000 22.00000 1.500 0 1 3
    ...

    Caveats on Chemical Matching

    It can take a significant amount of effort to chemically label a large site and to verify that the docking results are what were expected. If you use this chemical matching, plan to spend some time in preparation and validation BEFORE running an entire database of molecules.

    It must be pointed out that the ultimate arbiter of which orientations of a ligand are saved is actually the scoring function. If the scoring function is unable to discriminate what the user feels are bad chemical interactions, then any improvement with chemical matching will probably be obscured. In addition, if score optimization is used, then the orientation will be perturbed from the original chemically-matched position to a new score-preferred positions.

    RETURN TO TABLE OF CONTENTS

    3.4.4. Output

    Some informative messages are written to a file called OUTSPH. This includes the parameters and files used in the calculation. The spheres themselves are written to the clustered spheres file. They are arranged in clusters with the cluster having the largest number of spheres appearing first. The sphere cluster file consists of a header followed by a series of sphere clusters. The header is the line DOCK 3.5 receptor_spheres followed by a color table (see Chemical Matching). The color table contains color names each on a separate line. As sphgen produces no colors, the color table is simply absent.

    The sphere clusters themselves follow, each of which starts with the line

    cluster n number of spheres in cluster i

    where n is the cluster number for that sphere cluster, and i is the number of spheres in that cluster. Next, all spheres in that cluster are listed in the format below:

    FORMAT: (I5, 3F10.5, F8.3, I5, I2, I3)
    I: Integer F: Float
    012345012345678901234567890123456789012345678901234567890123456789
    63 5.58405 50.91005 59.97029 1.411 92 0 0
    64 9.00378 52.46159 62.30926 1.400 321 0 0
    66 11.43685 56.49715 61.79008 1.984 493 0 0
    I5: Column 0~4 (the first 5 columns) were used to put integer data.
    F10.5: Column 5~14 (total 10 columns, and 5 digits for mantissa) were used to put float data.

    The values in the sphere file correspond to:

    The clusters are listed in numerical order from largest cluster found to the smallest. At the end of the clusters is cluster number 0. This is not an actual sphere cluster, but a list of all of the spheres generated whose radii were larger than the minimum radius, before the filtering heuristics ( i.e., allowing only one sphere per atom and using a maximum radius cutoff) and clustering were performed. Cluster 0 may be useful as a starting point for users who want to explore a wider range of possible clusters than is provided by the standard sphgen clustering routine. The program creates three temporary files: temp1.ms, temp2.sph, and temp3.atc. These are used internally by sphgen, and are deleted upon completion of the computation. For more information on sphere generation and selection, go to the Sphere Generation and Selection demo.

    RETURN TO TABLE OF CONTENTS

    3.5. Showbox

    AUTHOR: Elaine Meng

    USAGE: showbox [< input.in]

    DESCRIPTION:

    Showbox is an interactive program for specifying the location and the size of the grids that will be calculated by the program grid. The output is in PDB format and thus can be visualized with many graphics programs. The user is asked whether the box should be automatically constructed to enclose all of the spheres in a cluster. If so, the user must also enter a value for how closely the box faces may approach a sphere center (how large a cushion of space is desired) and the sphere cluster filename and number. If not, the user is asked whether the box will be centered on manually entered coordinates or a sphere cluster center of mass. Depending on the response, the coordinates of the center or the sphere cluster filename and number are requested. Finally, the user must enter the desired box dimensions (if not automatic) and a name for the output PDB-format box file. If the input parameters are known, they can be listed in an input file and redirected into the program. For a flowchart of the input tree and an example input, see STEP 1 of the Grid Generation Tutorial.

    See also the discussion of showbox in the grid energy scoring section.

    RETURN TO TABLE OF CONTENTS

    3.6. Showsphere

    AUTHORS: Stuart Oatley , Elaine Meng , Daniel Gschwend

    USAGE: showsphere [< input.in]

    DESCRIPTION:

    Showsphere is an interactive program; it produces a PDB-format file of sphere centers and an MS-like file of sphere surfaces, given the sphere cluster file and cluster number. The surface file generation is optional. The user may specify one cluster or all, and multiple output files will be generated, with the cluster number appended to the end of the name of each file. The input cluster file is created using sphgen (see sphgen). Showsphere requests the name of the sphere cluster file, the number of the cluster of interest, and names for the output files. Information is sent to the screen while the spheres are being read in, and while the surface points are being calculated. If the input parameters are known, they can be listed in an input file and piped into the program.

    RETURN TO TABLE OF CONTENTS

    3.7. Sphere Selector

    AUTHOR: P. Therese Lang

    USAGE: sphere_selector < sphere_cluster_file.sph > <set_of_atoms.mol2> <radius>

    DESCRIPTION:

    Sphere_selector filters the output from sphgen selecting all spheres within a user-defined radius of a target molecule (see sphgen). The target molecule can be anything (e.g., known ligand or receptor residue) as long as it is in proper MOL2 format. WARNING: Please note that above order of input files must be maintained for the program to work. The total number of spheres is unlimited, eliminating the maximum of 100000 spheres in versions prior to DOCK 6.1.

    RETURN TO TABLE OF CONTENTS

    3.8. Antechamber

    Antechamber is an accessory program originally developed to prepare small molecules on the fly to use in AMBER. With permission, we have included a distribution of the code to facilitate preparing small molecules for Amber score. For more information on the use of the antechamber accessory, please visit the source web site at ambermd.org/antechamber/antechamber.html.
    From the web site:

    "Antechamber is a set of auxiliary programs for molecular mechanic (MM) studies. This software package is devoted to solve the following problems during the MM calculations: (1) recognizing the atom type; (2) recognizing bond type; (2) judging the atomic equivalence; (3) generating residue topology file; (4) finding missing force field parameters and supplying reasonable and similar substitutes. As an accessory module..., antechamber can generate input automatically for most organic molecules in a database...."

    RETURN TO TABLE OF CONTENTS

    3.9. tLEaP

    tLEaP is an accessory program that provides for basic model building and Amber coordinate and parameter/topology input file creation. For more information see the AmberTools distribution: ambermd.org/. tleap is only used during the preparation for Amber Score. tleap's charge tolerance has been increased from 0.0001 (as in Amber10) to 0.001 to avoid excessive instances of left over charges of plus or minus one.

    RETURN TO TABLE OF CONTENTS

    3.10. Amber Score Preparation Scripts

    AMBER score requires many additional input files beyond those of other DOCK scoring functions; see AMBER Score Inputs for more information. All the input files, such as the prmtop, frcmod, amber.pdb, etc. should be generated prior to running the DOCK AMBER score. Two perl scripts, prepare_amber.pl and prepare_rna_amber.pl, have been provided for this purpose. The script prepare_rna_amber.pl differs from prepare_amber.pl in that the former treats all nucleic acids as RNA whereas the latter treats them as DNA; in addition, prepare_rna_amber.pl supports the RNA specific features to neutralize to a total charge of zero and to solvate with water.

    The script prepare_amber.pl performs the following main tasks:
    (i) Reads the PDB file for the receptor; adds hydrogens and some other missing atoms if not present; adds AMBER force field atom types and charges. Generates Amber parameter/topology (prmtop) and coordinate (inpcrd) files. This is performed via the script amberize_receptor which invokes program tleap.
    (ii) Generates a modified mol2 file with suffix amber_score.mol2 (e.g., lig.amber_score.mol2) that appends to each TRIPOS MOLECULE a TRIPOS AMBER_SCORE_ID section containing a unique label (e.g., lig.1, mydnagil.1, etc.). This modified mol2 file should be assigned to the ligand_atom_file input parameter in the DOCK input file.
    (iii) Splits the ligand_mol2_file into separate mol2 files, one file per ligand, where the file prefixes are the AMBER_SCORE_ID unique labels. These mol2 files are used in the next step.
    (iv) Runs the antechamber program to determine the semi-empirical charges, such as AM1-BCC, for each ligand. Runs the tleap program to create parameter/topology and coordinate files for each ligand using the GAFF force field (prmtop, frcmod, and inpcrd), and writes a mol2 file for the ligands with GAFF atom types (gaff.mol2). This is performed via the script amberize_ligand.
    (v) Combines each ligand with the receptor to generate the parameter/topology and coordinate files for each complex. This is performed via the script amberize_complex which invokes program tleap.

    The script prepare_amber.pl also creates files with extensions .log and .out that contain the detailed results of the above steps. The amberize_*.out files can be browsed to verify that the preparation was successful. If prepare_amber.pl emits any warnings or errors then all these files should be scrutinized. Some errors are fatal, such as an inability to prepare the receptor; generally, and in this case especially start investigating by examining file amberize_receptor.out. Since preparation for Amber score is more stringent than preparation for other DOCK scores, usually a small fraction of a large set of ligands fails during prepare_amber.pl execution. This Dock-fans post discusses in detail the recovery options for failing ligands.

    See the tutorials for a complete example usage. Two options not discussed in the tutorial are designed for rescoring large ligand databases and should be applied only after experience with small data sets. These options are to use the existing ligand charges and to ignore non-receptor related errors. In addition, for DOCKing ligands to RNA the script prepare_rna_amber.pl supports neutralizing to a total charge of zero and solvating with water. Here are the man pages for prepare_amber.pl and prepare_rna_amber.pl.

    Under the best conditions these prepare scripts will work as designed, but if not then here is a starting point for looking under the hood.

    RETURN TO TABLE OF CONTENTS


    Molecular File Formats

    RETURN TO TABLE OF CONTENTS

    4.1. Tripos MOL2 Format

    This format is used for general molecule input and output of DOCK. Although previous versions of DOCK supported an extended PDB format to store molecule information, the current version now uses MOL2 as the molecule format. This format has the advantage of storing all the necessary information for atom features, position, and connectivity. It is also a standardized format that other modeling programs can read. For more information on how to generate MOL2 files from PDB files, go to the Structure Preparation demo.

    Of the many record types in a MOL2 file, DOCK recognizes the following: MOLECULE, ATOM, BOND, SUBSTRUCTURE and SET. In the MOLECULE record, DOCK utilizes information about the molecule name and number of atoms, bonds, substructures and sets. In the ATOM record DOCK utilizes information about the atom names, types, coordinates, and partial charges. In the BOND record, DOCK utilizes the atom identifiers for the bond. In the SUBSTRUCTURE record, DOCK records the fields, but does not utilize them. The SET records are entirely optional. They are used only in special circumstances, such as for ligand flexibility. In DOCK 6, additional record types have been introduced. One is the SOLVATION record that has the atomic desolvation information for the ligand atoms. The parameter read_mol_solvation can be used to read in this record. Another is the AMBER_SCORE_ID record that is used by AMBER score. This record should be appended to each TRIPOS MOLECULE and should contain a unique label for that ligand. This type of modified mol2 file is required by AMBER score and its name should be assigned to the ligand_atom_file input parameter in the DOCK input file.

    For extensive details on the MOL2 format, as well as example files, please refer to the Tripos web site documentation (http://www.tripos.com/data/support/mol2.pdf).

    RETURN TO TABLE OF CONTENTS

    4.2. PDB Format

    This format is used for several of the DOCK accessories and the AMBER score function. For extensive details on the PDB format, as well as example files, please refer to the Protein Databank File Format Documentation (http://www.pdb.org/pdb/static.do?p=file_formats/pdb/index.html).

    RETURN TO TABLE OF CONTENTS


    References

    DesJarlais, R.L. and Dixon, J.S. A Shape-and chemistry-based docking method and its use in the design of HIV-1 protease inhibitors. J. Comput-Aided Molec. Design. 8: 231-242, 1994.

    Ewing, T. J. A.; Makino, S.; Skillman, A. G.; Kuntz, I. D. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput-Aided Molec. Design. 15: 411-428 (2001).

    Ewing, T.J.A. and Kuntz, I.D., Critical evaluation of search algorithms for automated molecular docking and database screening. J. Comput. Chem. 18: 1175-1189, 1997.

    Ferro, D.R. and Hermans, J. Different best rigid-body molecular fit routine. Acta. Cryst. A. 33: 345-347,1977.

    Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Pairwise Solute Descreening of Solute Charges from a Dielectric Medium. Chem. Phys. Lett. 246: 122-129, 1995

    Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Parametrized models of aqueous free energies of solvation based on pairwise descreening of solute atomic charges from a dielectric medium. J. Phys. Chem. 100: 19824-19839, 1996

    Irwin, J.J. and Shoichet, B.K. ZINC - A free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45: 177-182, 2005.

    Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E., Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc. Chem. Res.33: 889-897, 2000

    Kuhl, F.S., Crippen, G.M., and Friesen, D.K. A Combinatorial Algorithm for Calculating Ligand
    Binding. J. Comput. Chem. 5:24-34, 1984.

    Kuhn, H. W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2:83-97, 1955.

    Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. and Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161: 269-288, 1982.

    Lang, P.T., Brozell, S.R., Mukherjee, S., Pettersen, E.T., Meng, E.C., Thomas, V., Rizzo, R.C., Case, D.A., James, T.L., Kuntz, I.D. DOCK 6: Combining Techniques to Model RNA-Small Molecule Complexes. RNA 15:1219-1230, 2009.

    Liu, H.-Y., Kuntz, I. D., and Zou, X. Pairwise GB/SA Scoring Function for Structure-based Drug Design. J. Phys. Chem. B. 108: 5453-5462, 2004.

    Luty, B. A.; Wasserman P.F.; Hodge C.N.; Zacharias M.; McCammon J.A. A Molecular Mechanics / Grid Method for Evaluation of Ligand-Receptor Interactions. J. Comput. Chem. 16:454-464 (1995)

    Meng, E.C., Gschwend, D.A., Blaney, J.M. and Kuntz, I.D. Orientational sampling and rigid-body minimization in molecular docking. Proteins. 17(3): 266-278, 1993.

    Meng, E.C., Shoichet, B.K. and Kuntz, I.D. Automated docking with grid-based energy evaluation. J. Comp. Chem. 13: 505-524, 1992.

    Miller, M.D., Kearsley, S.K., Underwood, D.J. and Sheridan, R.P. FLOG -A system to select quasi-flexible ligands complementary to a receptor of known three-dimensional structure. J. Comput. Aided Mol. Design. 8: 153-174, 1994.

    Moustakas, D.T., Lang, P.T., Pegg, S., Pettersen, E.T., Kuntz, I.D., Broojimans, N., Rizzo, R.C. Development and Validation of a Modular, Extensible Docking Program: DOCK 5. J. Comput. Aided Mol. Design. 20:601-609, 2006.

    Munkres, J. Algorithms for the assignment and transportation problems. J. Soc. Indust. Appl. Math. 5:32-38, 1957.

    Nelder, J.A. and Mead, R., A Simplex-Method for Function Minimization. Computer Journal, 7: 308-313, 1964.

    Balius, T.E., Mukherjee, S., and Rizzo, R.C., Implementation and Evaluation of a Docking-Rescoring Method using Molecular Footprint Comparisons. J. Comput. Chem., 32(10): 2273-2289, 2011.

    Mukherjee, S., Balius, T.E., and Rizzo, R.C., Docking Validation Resources: Protein Family and Ligand Flexibility Experiments. J. Chem. Inf. Model., 50(11): 1986-2000, 2010.

    Onufriev, A., Bashford, D., and Case, D.A. Exploring protein native states and large-scale conformational changes with a modified generalized Born model. Proteins. 55:383-394, 2004.

    Pearlman, D.A., Case, D.A.,Caldwell, J.W., Ross, W.S., Cheatham, III, T.E., DeBolt, S., Ferguson, D., Seibel, G. and Kollman, P.A. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comp. Phys. Commun. 91:1-41, 1995.

    Rizzo, R. C.; Aynechi, T.; Case, D. A.; Kuntz, I. D. Estimation of Absolute Free Energies of Hydration Using Continuum Methods: Accuracy of Partial Charge Models and Optimization of Nonpolar Contributions. J. Chem. Theory Comput. 2: 128-139, 2006.

    Alan P. Graves, Devleena M. Shivakumar, Sarah E. Boyce, Matthew P. Jacobson, David A. Case and Brian K. Shoichet. Rescoring Docking Hit Lists for Model Cavity Sites: Predictions and Experimental Testing. J. Mol. Biol. 377: 914-934, 2008.

    Shoichet, B.K., Bodian, D.L. and Kuntz, I.D. Molecular docking using shape descriptors. J. Comp. Chem. 13(3): 380-397, 1992.

    Shoichet, B.K. and Kuntz, I.D. Protein docking and complementarity. J. Mol. Biol. 221: 327-346, 1991.

    Srinivasan, J.; Cheatham, T. E.; Cieplak, P.; Kollman, P. A.; Case, D. A., Continuum solvent studies of the stability of DNA, RNA, and phosphoramidate - DNA helices. J. Am. Chem. Soc. 120:9401-9409, 1998.

    Trott, O and Olson, A. J. AutoDock Vina: Improving the speed and accuracy of Docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31:455-461, 2010.

    Tsui, V. and Case,D.A. Theory and applications of the generalized solvation model in macromolecular simulations. Biopolymers. 56:275-291, 2001.

    Verkhiver, G.M.; Rejto, P.A.; Bouzida, D.; Arthur, S.; Colson, A.B.; et. al. Towards Understanding the Mechanisms of Molecular Recognition by Computer Simulation of Ligand-Protein Interactions. J. Mol. Recog. 12:371-389 (1999) 

    Wang, J., Wolf, R.M., Caldwell, J.W., Kollman, P.A. and Case, D.A. Development and testing of a general Amber force field. J. Comput. Chem. 25:1157-1174, 2004.

    Weiser, J., Shenkin, P.S., and Still, W.C. Approximate atomic surfaces from linear combinations of pairwise overlaps (LCPO). J. Comput. Chem. 20:217-230, 1999.

    Zou, X. Q., Sun, Y. X., and Kuntz, I. D. Inclusion of solvation in ligand binding free energy calculations using the generalized-born model. J. Am. Chem. Soc. 121 (35): 8033-8043, 1999.

    RETURN TO TABLE OF CONTENTS


    Acknowledgments

    We gratefully acknowledge the use of computational facilities at the Ohio Supercomputer Center. We thank OpenEye Scientific Software for an academic license.

    RETURN TO TABLE OF CONTENTS