Generating Spheres

Author: P. Therese Lang
Last updated October 7, 2023 by Scott Brozell

This tutorial describes the three steps required to define receptor active sites for DOCK calculations. We study the complex L-Arabinose-Binding Protein bound to L-Arabinose (PDB ID 1ABE ) as an example system. However, these techniques should be transferable to any protein-ligand system.

To start this tutorial, obtain the rec_noH.pdb and the lig_charged.mol2 files from the "Structure Preparation Tutorial." UCSF Chimera's Tool Write DMS (Chimera versions 1.3 and later) as well as the sphgen, sphere_selector, and showsphere programs that are distributed and installed with DOCK are required.

STEP 1: Generate the molecular surface of the receptor.

The molecular surface of the target is generated, based on the algorithm developed by Richards (Ann. Rev. Biophys. Bioeng. 1977. 6:151-176) and adapted by Connolly (M. Connolly, Ph.D. Thesis, University of California, Berkeley, 1981), by rolling a ball the size of a water molecule over the van der Waal's surface of the target. In addition, the surface normal vector at each surface point is computed; this will be used later to calculate the size of each sphere.

NOTE: For Windows users, dms and sphgen cannot read files that are created under the Disk Operating System (DOS). If you encounter strange problems then this might be the source. See the DOCK 6 FAQ DOS entry for more details.

Primary OPTION: Use the Write DMS Tool in Chimera

Open the rec_noH.pdb file in Chimera, e.g., File -> Open -> rec_noH.pdb. (See the Structure Preparation Tutorial for help with basic Chimera operations.) Generate the surface: Actions -> Surface -> Show. Save the surface: Tools -> Structure Editing -> Write DMS. Note that this does not exactly duplicate the dms output due to the vertex density and the atomic radii differences mentioned in the Write DMS documentation. We have no recommendations regarding the best settings. (The rest of this tutorial and the other tutorials in this series rely on the dms generated rec.ms as a matter of convenience for the DOCK developers.) For more information on the contents and format of the output file, see the Chimera DMS file format documentation.

Secondary OPTION: Use the dms program

The program dms, while deprecated, can still be downloaded from www.cgl.ucsf.edu/Overview/software.html#dms. The command line options for the dms program are

USAGE: dms input_file [-a -d density -g file -i file -n -w radius -v] -o file

-a use all atoms, not just amino acids
-d change density of points
-g send messages to file
-i calculate only surface for specified atoms
-n calculate normals for surface points
-w change probe radius
-v verbose
-o specify output file name (required)

To generate the surface, use the command "dms rec_noH.pdb -n -w 1.4 -v -o rec.ms". For more information on the contents and format of the output file, see the documentation included in the dms distribution. For help with dms installation, read the DOCK 6 FAQ dms entry.

A graphical representation of the molecular surface, shown in green, is below:

Image generated using Chimera (https://www.cgl.ucsf.edu/chimera)

STEP 2: Generate the spheres surrounding the receptor.

Sets of overlapping spheres are used to create a negative image of the surface invaginations of the target. The program sphgen that is distributed as an accessory with DOCK (Kuntz et al. J. Mol. Biol. 1982. 161: 269-288) generates spheres from the molecular surface and the normal vectors.

 

(a) Each sphere is generated tangent to surface points i, j with the center on the surface normal of point i. (b) Schematic representation of a small binding site formed by five atoms (purple). The spheres (blue) are generated using points from the molecular surface (green) with their centers lying along the surface normals (thin line).

Spheres are calculated over the entire surface, producing approximately one sphere per surface point. This dense representation is then filtered to keep only the largest sphere associated with each surface atom. The filtered set is then clustered using a single linkage algorithm. Each resulting cluster represents an evagination in the target. The sphgen input file must be named INSPH, and contains the following information:

rec.ms

#molecular surface file (no default)

R

#sphere outside of surface (R) or inside surface (L) (no default)

X

#specifies subset of surface points to be used (X=all points) (no default)

0.0

#prevents generation of large spheres with close surface contacts (default=0.0)

4.0

#maximum sphere radius in Angstroms (default=5.0)

1.4

#minimum sphere radius in Angstroms (no default)

rec.sph

#clustered spheres file (no default)

Note that the comments above - labeled by # - are for the tutorial and must not exist in the INSPH file.

To generate the spheres, simply use the command "sphgen" in the same folder that contains the INSPH and the rec.ms files. The output will be two files: rec.sph, which contains the spheres in clusters, and OUTSPH, which contains general information about the calculation.

If sphgen has been run before then be sure to remove the output files (OUTSPH and rec.sph) prior to its next execution. Finally, for technical reasons, sphgen cannot handle more than 99999 spheres. For a large target, we recommend selecting a subsection of the protein via a visualization program and using that to generate the molecular surface and the spheres.

STEP 3: Select a subset of spheres to represent the binding site(s).

A set of spheres is a required input of the grid generation program. Three options are described below for selecting the spheres that will represent the binding site(s). Note that a sphere file, xxx.sph, is plain text and can be edited.

OPTION 1: Use the largest cluster generated by sphgen

The clusters contained in the rec.sph file are ranked according to size (number of spheres in the cluster). The largest cluster is typically the ligand binding site of the receptor. To visualize the spheres, use the program showsphere, distributed as an accessory with DOCK. The showsphere input file can have any name and contains the following information:

rec.sph

#sphere cluster file

1

#cluster number to process (<0 = all)

N

#generate surface as well as pdb file

selected_cluster.pdb

#name for output file

Note that the comments above - labeled by # - are for the tutorial and must not exist in the showsphere input file.

To convert the first and largest cluster in the sphere file to pdb format, use the command "showsphere < sphgen_cluster.in". Below is a picture of the largest cluster, where the protein is shown in purple and the spheres are shown in yellow (sphgen_cluster.pdb). Note that in Chimera each SPH residue may be initially displayed as a bunch of little dots. Actions -> Atoms/Bonds -> sphere turns the dots into sizable spheres.

Image generated using Chimera (https://www.cgl.ucsf.edu/chimera)

To use the first and largest cluster in the sphere file as input for the grid generation program, simply edit the sphere cluster file, rec.sph, to delete the other clusters.

OPTION 2: Select spheres within some radius of a desired location

If the active site is known then one can select spheres within a radius of a set of atoms that describes the site. To do this use the program sphere_selector, which is distributed as an accessory with DOCK. The syntax for sphere_selector is

Usage: sphere_selector sphgen_sphere_cluster_file.sph set_of_atoms_file.mol2 radius

Here, we select all spheres within 10.0 Angstroms root mean square deviation (RMSD) from every atom of the crystal structure of the ligand, using the command "sphere_selector rec.sph lig_charged.mol2 10.0". The output file always has the name selected_spheres.sph.

These spheres can be visualized using the showsphere program, mentioned in OPTION 1, with the command "showsphere < selected_spheres.in". Below is a picture of the selected spheres (blue) (selected_spheres.pdb):

Image generated using Chimera (https://www.cgl.ucsf.edu/chimera)

OPTION 3: Add spheres manually

This option can be used if a region of space of interest is poorly populated by sphgen. To do this, start with a template sphere file and edit it to add or modify parameters according to the format below:

FORMAT: (I5, 3F10.5, F8.3, I5, I2, I3) with values corresponding to

  • Number of the first atom with which the surface point is associated; that is, the atom whose surface normal defines the sphere center; point i in the above picture
  • X, Y, and Z coordinates of the new sphere center
  • The sphere radius
  • Number of the second atom with which the surface point is associated; point j in the above picture
  • Critical cluster to which this sphere belongs (used only for critical points filter)
  • Sphere color (used only for chemical matching)