Introduction How To Use


Protein complexes are involved in many important processes in a living cell. In order to understand the mechanisms of these processes, it is necessary to solve the 3D structure of the protein complexes. Experimental techniques such as X-ray crystallography, nuclear magnetic resonance (NMR), and cryo-electron microscopy have been used to solve the 3D structure of protein complexes, as shown in the large number of entries of complex structures in the Protein Data Bank (PDB). When protein complex structures have not been solved by experiment, it is possible to use computational tools to construct models of these complexes. A protein docking program takes two or more component protein structures as input and assembles them into 3D structure models of a protein complex. Input proteins can be either experimentally solved or computationally modeled structures using protein structure prediction programs.

This server provides access to LZerD for pairwise protein docking and Multi-LZerD for docking 3 or more proteins simultaneously. As input, LZerD takes two protein structures while MultiLZerD takes 3 to 6 protein structures. Both methods output docked models of the input proteins. By combining a soft protein surface representation using 3D Zernike descriptors (which are based on a mathematical moment expansion of the shape function) with geometric hashing, LZerD and Multi-LZerD can quickly search the space of binding poses while tolerating some subunit flexibility, including side-chain flexibility.

To perform docking with LZerD or Multi-LZerD, submit the 3D structures of the subunits to assemble. If 3D structures are not available, users can submit the amino acid sequences of the subunits to the upload page for predicting protein structures from sequence. Then, the AttentiveDist program will build 3D structure models of the subunits, which can then be passed to the protein docking pipeline.

Algorithm

LZerD pairwise docking

Pairwise docking by LZerD (Local 3D Zernike descriptor-based protein Docking) is computed by the following three steps:
  1. LZerD takes two structures provided by the user (called a receptor and a ligand) as input and makes tens of thousands of docking conformations, sampling all possible interaction interface regions and interaction angles. If a docking conformation has too many atom clashes, too small interaction area, or low shape complementarity at the interface region, that conformation is rejected. In LZerD, a protein structure is represented by molecular surface, which is segmented into overlapping local surface regions. And each local surface region is represented by a mathematical moment-based shape descriptor called 3D Zernike descriptor (3DZD). 3DZD is rotation-invariant, which makes computation of shape complimentarily fast, and also allows a “soft” representation of surface and thus is robust to induced conformational change of proteins that occurs upon docking at a certain degree. The conformational exploration is performed by the geometric hashing algorithm. If the user provided constraints of residue-residue distances or interface residues, models that do not agree with the constraints are rejected.
  2. Generated docking models are clustered with a user-defined cluster cutoff (the default is a root-mean square deviation, RMSD, of 4 Angstroms). Typically, this step reduces the docking models to up to a few thousand to a few tens of thousands, depending on the proteins and the cutoff.
  3. The remaining models are ranked by the sum or score ranks (ranksum) from 3 scoring functions, DFIRE, GOAP, and ITScore. These 3 scoring functions essentially check if atom interactions in a model have similar distance and angle features to those observed in experimentally determined protein structures. If a model is consistently ranked as the top among all the models, then the ranksum will be 1+1+1 = 3. Ranksum was shown to perform very well in docking model ranking in CAPRI protein docking assessments. In the docking results page, models are initially ranked by ranksum. Refinement is not currently applied to the models. Thus, the structure of individual receptor and ligand are the same as what the user has input.

Multi-LZerD multiple-chain docking

Multi-LZerD takes 3 or more protein structures as input and assembles all of them into complex structures.
  1. First, LZerD is used to generate pairwise docking models for every pair of structure combinations. For example, if 3 chains are input, A, B, C, then pairwise models are generated for A-B, A-C, and B-C. They are then clustered with a user-configuable RMSD cutoff (default 10 Å).
  2. Next, Multi-LZerD uses a genetic algorithm to combine pairwise models to generate full-chain models. In the genetic algorithm, different combinations of pairwise models are iteratively generated and selected. For selecting models in the process, a molecular mechanics force field is used, which is specially trained for docking model selection. Finally, models are generated according to the user-configurable population size (default 200) and clustered with a the same user-configurable cutoff as before
  3. The resulting models are ranked by ranksum and presented in the result page. Refinement is not currently applied to the models.

Mem-LZerD membrane protein docking

Mem-LZerD takes 2 protein structures as input and assembles them under the assumption that they are integral to or interacting with the same membrane.
  1. First, the input protein models must be oriented in the membrane. This means that any transmembrane region should be roughly centered vertically on the X-Y plane, and that the transmembrane region should be parallel to that plane; or, for peripheral membrane proteins, instead the protein surface should be centered on the membrane boundary. We recommend using models taken from the Orientations of Proteins in Membranes (OPM) database or oriented using the Positioning of Proteins in Membranes (PPM) software package.
  2. Mem-LZerD then uses a sampling method derived from LZerD as described above, with the addition of membrane position information from the input model and a tree augmentation that allows the sampling of only LZerD poses which agree with the membrane positioning.
  3. The resulting models are ranked by the Mem-LZerD consensus score and presented in the result page. Refinement is not currently applied to the models.

AttentiveDist single-chain protein structure prediction

AttentiveDist takes individual protein sequences as input and predicts their structures de novo, without reference to any template structures.
  1. First, four multiple sequence alignments are generated from the input sequence with different e-value cutoffs.
  2. Next, a deep neural network is passed the sequences, a position-specific scoring matrix, an HMM profile, secondary structure and solvent-accessible surface area predictions, initial contact predictions, mutual informtion, and a pairwise statistical potential. From this, predicted distributions of the pairwise distances between the residues are generated.
  3. To generate full-atom models, coordinates are generated in PyRosetta and optimized to satisfy the predicted contacts.
  4. The resulting models are ranked by ranksum and presented in the result page.
For more details, see the original papers listed in References.

Which docking method should you use?

LZerD Multi-LZerD IDP-LZerD Mem-LZerD
Available through webserver? Yes Yes No Yes
Available for download? Yes, here Yes, here Yes, here No
Can dock 2 subunits? Yes Yes Yes Yes
Can dock 3+ subunits? No Yes No No
Can dock a disordered subunit? No No Yes No
Can consider a lipid membrane? No No No Yes