Index of /ftp/pub/bmrb/entry_lists/timedomain_entries/bmr6682/timedomain/analysis/9valid
All required steps and software documentation for doing a refinement in
water on an already good structure. Make sure the settings at the bottom
of this document are defined.
*** START
*** Got new topology and parameter files from HIC-Up database in Sweden
Contains definitions for Fe2S2 cluster.
Called them fes_xplor_par.txt and fes_xplor_top.txt
Adjusted many angles to 180 and 0 from measured values in order to
impose perfect symmetry. For most parameters the averaged values
observed in Milo's calculation were imposed.
*** Using protocol from Aart Nederveen to generate an xplor psf/mtf file
This protocol actually uses Chris Spronk/Jens Linge's code
*** Split the pdb file into multiple files using my code:
splitpdb_new $x.pdb
This assumes nawk is present or linked to gawk (as on cesg-master)
and x is a variable set to 1xxx which will be the project name.
The input PDB file at this point is molmol's output without pseudo atoms.
Doesn't need to have protons because they will be generated.
*** Renames a bunch of atom names to xplor convention.
$SCRIPTS/pdbcns/convert.py -convertpdb $x"_1".pdb PDB out.pdb XPLOR
*** Rename the atom names in the FES cluster:
sed -e 's/FE HIS 67/FE1 FES 113/' out.pdb \
-e 's/FE CYS 45/FE2 FES 113/' \
-e 's/SE CYS 45/S1 FES 113/' \
-e 's/SE HIS 67/S2 FES 113/' > $x"_template".pdb
*** Added topology and parameters to the same file (topallhdg5.3.pro).
Don't want to bother with specifying multiple files for them.
*** Using xplor-nih to create psf topology file using generate_tmoc.inp
script but needs parameter and topology files specific for fes cluster.
Creating one takes a lot of time. Getting the patches in order
takes even more time. Don't forget the cis-peptide planes patch
if needed in your protein.
xplor < generate_tmoc.inp > generate_tmoc.log
This creates the psf file. It's a good idea to check the
energy so the parameters get checked too.
Result: 1xxx.psf protein structure file (no coordinates)
*** Strip the protons and generate them anew, which is apparently easier
than renaming them. The way to do this is by reading the psf file 1xxx.psf
just created. Reading all atoms that already coincide which are all but the
cluster atoms and many protons. Keep the coordinates of the known
atoms just read and generate coordinates for all unknown atoms. This way
the coordinate file doesn't need to be corrected too much.
generateProtons.csh can be run on cluster like:
$QSUB $CONV_DIR/generateProtons.csh
It takes about 30 seconds per model so 10 min for 20 models.
This could of course be done in parallel.
Output can be found in directory: /u02/jurgen/job_io (define your own)
in a file named after the job id and output stream type error ER or
output OU. The output is very async-ed so wait for it.
*** Rename the files from 1xxx_cns_99.pdb to 1xxx_99.pdb etc.
A simple script for doing this is called renameFile.csh
Note that this will overwrite the files generate before by
splitpdb_new.
*** Copy the converted files to the water refinement input dir:
\cp *_*.pdb $COORINDIR
\cp *.psf $COORINDIR/../psf
*** Convert the experimental data like upper/lower bounds and dihedrals.
(aqua is installed and compiled on cesg-master)
type: 'aqua' alias (see settings)
type: 'qproject tmoc'
type: 'qconvert -t 4 tmoc.upl'
this specifies the input format is dyana (format 4)
it will create an aqua output file tmoc.noe among other files.
replace all + and - signs with a space in a text editor or do
something like (so negative angles don't get inverted when
you use this for angles):
sed 's/[+-] / /g' tmoc.noe > t
mv t tmoc.noe
type: cp ../conversion/1xxx_1.pdb .
To convert the aqua data to xplor data
type: aq2x xplor tmoc.noe tmoc.tbl 1xxx_1.pdb
Move and rename to the input dir for structure calculation:
cp tmoc.tbl ../RESAMPLE/tables/1xxx_noe.tbl
And for dihedrals:
convXPLORtorsrestr.awk tmoc.aco > tmoc_dihedral.tbl
(this 31 line script resides in my bin dir)
Edit manually for deviating format and move to tables dir
like the NOEs.
*** To check the data coordinates versus constraints type
Use the complete pdb file with all models by first creating it:
joinpdb -o 1xxx_xplor.pdb 1xxx_[0-9]*.pdb
and then
aqpc -r tmoc -xplor -r6sum 1 1xxx_xplor.pdb
I found in the summary file: 1xxx_xplor.nsm
Largest UPPER BOUND violations:
_ THR 70 HG1 _ ASN 80 HB1 0.56 (model 10)
_ CYS 84 HA _ CYS 85 HN 0.44 (model 10)
_ LYS 103 HA _ GLY 104 HN 0.42 (model 12)
Note that the order of the models in the ensemble PDB
file is alphabetical on the number and not sequential as
expected (I would have done this differently if I developed it).
*** Run the prelim procheck/whatif analyses
In file refine.pars set to 1 structure to test.
NINPUTSTRUCTURES = 1
Set code in nmr_waterrefine.py to exit after these checks.
python2 nmr_waterrefine.py
*** When all is successfull rerun it with the full number of structures
(20 in tmoc case).
*** Run prelim refinement on 2 structures or so with small number of
steps. I found this time that the structures explode during the phase
where the temp is high and the dihe terms are scaled up by factor of 3.
This phase is called hot. We'll it is.
Trying to find out what is causing the problem. So far tried:
- lowering the max temp to 300
- downsizing step from 4 to 1 fs
- now trying structures without fes group and with normal protonation on them.
that works without explosion twice already.
Ok, turns out I need to use the IMPRoper forces in stead of DIHEdral.
The calculation was instable with dihedral terms of 750 times the factor
of 3. The impropers aren't scaled with this factor 3 in the protocol and
is stable. Set everything back to original protocol settings.
*** Adjusting the number of structures back to the full number (20) and reran the
refinement with the normal number of steps.
*** Get statistics from the summary files: analyzed_input/summary_analyzed.txt
*** Get statistics from PDB files by typing:
grep violations refined_input/refined_*.pdb | gawk '{if ($6>0) print}'
This time the only violations were small dihedral angle constraint violations
over 5 (but below 10).
*** Get statistics from PDB files by typing:
grep violations refined_input/refined_*.pdb | gawk '{if ($6>0) print}'
This time the only violations were small dihedral angle constraint violations
over 5 (but below 10).
*** DONE
--- Settings:
setenv x=1xxx
setenv CNS_BIN /s/src/cns_solve_1.1/intel-i686-linux/bin
setenv TOPPAR /u02/jurgen/CESG/Brian/tmoc/RESAMPLE/toppar
setenv CONV_DIR /u02/jurgen/CESG/Brian/tmoc/conversion
setenv SCRIPTS /u/jurgen/CloneWars/Aart/water_bmrb
## You should change the job_io dirs for sure.
setenv QSUB '/usr/bin/qsub -q seriallong -o /u02/jurgen/job_io -e /u02/jurgen/job_io -l nodes=1:ppn=1 -S /bin/tcsh -V'
setenv COORINDIR /u02/jurgen/CESG/Brian/tmoc/RESAMPLE/input
alias aqua 'setenv aquaroot ~jurgen/src/aqua3.2; source $aquaroot/aqsetup'
--- Nice software on cesg-master some of it in:
/u02/jurgen/bin
jedit text editor works over x (in java so slow but doable)
aqua first type: aqua which should be an alias defined in .cshrc see settings.