ReplicaExchangePost-Processing

Once the replica exchange simulation has finished, you will need to compile the individual box data into one energy summary.
All post-processing scripts can be found on the shared folder.

Your simulation will output the following folders and files into the OUTPUT folder:

  • GROMACS folder
    • .trr files are used for VMD visuals
    • .gro files were referenced during simulation
  • BOX# folders
    • These files are used in post-processing scripts and contain box-specific information.
  • swap#.out files These files indicate with whom simulations swaps occurred. Just a text file.

This page takes these files and converts them into human-readable heat capacity and native contact data:

  • cv.out
  • contacts.out



Phase 1: Matlab pre-processing for mbar analysis

Before we can appropriately perform our mbar analysis we need to get the simulation files in the right format.

  1. Put in a copy of mbar_prep.m into your OUTPUT folder. This is a matlab code to prepare your output folders for mbar analysis
  2. Using vim, open mbar_prep.m and change the user defined variables to match your models parameters:
    • Ncontacts = Total number of contacts
      • Find this in INPUT/contacts.inp if not known
    • Nlines_keep = Number of lines of data you want to keep and will be inputted into mbar analysis
      • 5000 is a number that works and shouldn’t change
    • Nequil = The Number of equlibrium steps that should be ignored.
      • To Find value, go to OUTPUT/BOX0/ener_box0.output
      • Then find the line number where column “1: iteration” first hits your sim_eq_steps (in your simul.input file) number
      • Nequil = This line number - 1 (to account for the headder)
      • [Alternate method] Nequil = sim_eq_steps / sim_blockc + 1
    • temperatures = Box temperatures of your model
      • find this in INPUT/box_config.txt if not known
    • ranges = temperature range segments. Because mbar analysis requires a lot of memory, it should be broken up into multiple ranges. Two or Three is common.
      • Each range should have ~20 data points or temp boxes
      • Make sure the ranges overlap by ~8 data points.
  3. Run mbar_prep.m. To do this:
    • $ module load matlab (if needed)
    • $ matlab < mbar_prep.m
  4. This will generate new folders called: mbar_input#
    • Will take 5-10min depending on the number of boxes
  5. The generated folders are ready for mbar analysis



Phase 2: mbar analysis using python

This step applies mbar analysis to the data, outputted is human-readable heat capacity data NOTE: If you are using watt for the next step, make sure you have the right auto_mbar.sh and right cv_calc.m. You will also need to mbar_input* files.

  1. If not already there, put a copy of the “mbar” folder and auto_mbar.sh into you OUTPUT folder. The shell script runs mbar analysis on all of your mbar_input# folders. Use vim to make make sure the shell script matches your simulation:
    • Nranges = number of ranges you created for the analysis. ie. number mbar_input# folders created in the previous step.
    • mbar/analyzer.py; increment = Temperature increment for your system. Leave this as 1 unless you want it changed otherwise
    • Check to make sure there aren’t any mbar_output# folders in your OUTPUT folder because the outputted data will not overwrite anything that is already there.
  2. Run auto_mbar.sh
    • $ sh auto_mbar.sh
    • Will take ~10min
  3. The generated folders (OUTPUT/mbar_output#/) are ready for post-processing and stitching together. They are now in human-readable format.



Phase 3: post-mbar stitching

Due to the size of mbar inputs, mbar typically can't process the entire replica exchange and needed to be broken into parts (files generated in Phase 1). Here you will stitch them together to get one continuous set of data ready for analysis and conclusions.

  1. The file that contains the heat capacity information: cv.out
    • Temperatures are incremented by 1°C (unless otherwise changed)
  2. The file that contains the native contact information: contacts.out
    • Temperatures are incremented by 1°C (unless otherwise changed)
  3. To stitch together, use excel or another processing program to find the transition point between the mbar_output# outputs
    • Subtract the values in the overlapping region and the point where you have the smallest difference is the point where you should switch to the other set of data.

Created by Addison Smith on 1/2017 (addisonsmith390@gmail.com)