Back to the schedule
Previous: Managing a research workflow (bibliographies, note-taking, and arXiv)
Next: Finding Your (In)voice: Emacs for Invoicing

Reproducible molecular graphics with Org-mode

Blaine Mooers

CategoryOrgMode

Q&A: live Q&A or Etherpad
Duration: 8:04

If you have questions and the speaker has not indicated public contact information on this page, please feel free to e-mail us at emacsconf-submit@gnu.org and we'll forward your question to the speaker.

Description

Research papers in structural biology should include the code used to make the images of molecules in the article in the supplemental materials. Some structural bioinformaticists have started to include their computer code in the supplemental materials to allow readers to reproduce their analyses. However, authors of papers reporting new molecular structures often overlook the inclusion of the code that makes the images of the molecules reported in their articles. Nonetheless, this aspect of reproducible research needs to become the standard practice to improve the rigor of the science.

In a literate programming document, the author interleaves blocks of explanatory prose between code blocks that make the images of molecules. The document allows the reader to reproduce the images in the manuscript by running the code. The reader can also explore the effect of altering the parameters in the code. Org files are one alternative for making such literate programming documents.

We developed a yasnippet snippet library called orgpymolpysnips for structural biologists (https://github.com/MooersLab/orgpymolpysnips). This library facilitates the assembly of literate programming documents with molecular images made by PyMOL. PyMOL is the most popular molecular graphics program for creating images for publication; it has over 100,000 users, which is a lot of users in molecular biology. PyMOL has been used to make many of the images of biological molecules found on the covers of many Cell, Nature, and Science issues.

We used the jupyter language in org-babel to send commands from code blocks in Org files to PyMOL's Python API. PyMOL returns the molecular image to the output block below the code block. An Emacs user can convert the Org file into a PDF, `tangle' the code blocks into a script file, and submit these for non-Emacs users. We describe the content of the library and provide examples of the running PyMOL from Org-mode documents.

Discussion

Pad:

  • Q1:  Do you also do any hydrogen-bond analysis in your workflows? Also, could your snippet library be extended for other non-python simulation programs like GROMAC?
    • A: Yes, i have a snippet that generate publication qualtiy hydrogen bonds. Yes, I have thought of making snippet library molecular simulation like Gromacs and AMNER and drug design software packages like autodock Vvna and rdkit. They can help lower the barrier to entry. I made library for crystallographic computing with CCTBX for use in Jupyter. I should make it available for org-mode.
  • Q2: We've seen a few talks regarding managing academic papers and citations in emacs/org, what does your workflow look like?
    • A: I switched to Emacs as my primary editor 3 months ago. I have yet to write a paper in Org. I am very comfortable with LaTeX and I have been writing my papers on Overleaf in LaTeX for several years. I used bibtex and JabRef to manage by refernces. I have started playing by org-ref. It looks super promising.
  • Q3: Hi Blain, you mentioned that you have been able to come back to a file years later, how do you manage the environment that the org file executes in?
    • A: Good question. The PyMOL code is good for years so the images should be reproducible regardless of the version of org. PyMOL's domain specific language is very stable. The Python code largely just wraps around the DSL code.
  • Q4: Have you used Org Mode and pyMOL for publications? Could you share a link to any of them?
    • A: I have yet to use org in a publication. The first step will be to use it for supplemental material.

BBB discussion:

  • We've seen a few talks regarding managing academic papers and citations in emacs/org, what does your workflow look like?
    • Blaine: My workflow involves a dozen different software packages and 20-200 GB of data. Complete literate programming is not possible at this time. The smallest possible step towards that goal is to make the molecular images reproducible because the files involved are on 1-100 MB in size.
    • Questioner: I assume that's why there might be lag with several images rendered on an org buffer?
  • I was specifically interested in your workflow with managing citations and papers as I'm sure you have to do, is there anything in particular you use for citation management?
    • Blaine: I switched to Emacs as my primary editor 3 months ago. I have yet to write a paper in Org. I am very comfortable with LaTeX and I have been writing my papers on Overleaf in LaTeX for several years. I used bibtex and JabRef to manage by references. I have started playing by org-ref. It looks super promising.
    • Questioner: I still use zotero and biblatex, but the previous two talks about org-ref got me thinking about my workflow
  • Have you used Org Mode and pyMOL for publications? Could you share a link to any of them?
    • Blaine: I have yet to use org in a publication. The first step will be to use it for supplemental material.
    • thanks, makes sense, I'm off in a part of the python world where code base churn can be pretty severe; but it sounds like pymol is able to avoid those issues
    • Blaine: PyMOL as a domain specific language that is very stable. The transition from Python2 to Python3 as bit disruptive.
  • Hi Blaine, you mentioned that you have been able to come back to a file years later, how do you manage the environment that the org file executes in?
    • Blaine: Good question. The PyMOL code is good for years so the images should be reproducible regardless of the version of org.

BBB feedback:

  • Blane, great job with the talk. Awesome presentation.
    • I know people loved it in the IRC chat :D
  • I can share that I was excited to see how you made things so seamless and integrated feeling into Emacs. The results are really eyepopping.

IRC discussion:

  • which is the package name for export org mode to pymol?
  • the async header argument can be helpful with the problem of the amount of time for generating the images
  • think of this is use case explication for being able to manage and render 3d models in org
  • It might be faster to keep sections folded by default
  • This is exactly the sort of thing my users love.

Outline

  • 5-10 minutes: (brief description/outline)
    • Title slide
    • Structural Biolog Workflow in the Mooers Lab
    • Cover images made with PyMOL

    • Why develop a snippet library for your field?

    • PyMOL in Org: kernel specification
    • Creating a conda env and installing PyMOL
    • Example code block in Org to make DSSR block model of tRNA
    • Resulting image
    • Summary
    • Acknowledgements

Transcript

Hi, I'm Blaine Mooers. I'm going to be talking about the use of molecular graphics in Org for the purpose of doing reproducible research in structural biology. I'm an associate professor of biochemistry and microbiology at the University of Oklahoma Health Sciences Center in Oklahoma City. My laboratory uses X-ray crystallography to determine the atomic structures of proteins like this one in the lower left, and of nucleic acids important in human health. This is a crystal of an RNA, which we have placed in this X-ray diffraction instrument. And after rotating the crystal in the X-ray beam for two degrees, we obtain this following diffraction pattern, which has thousands of spots on it. We rotate the crystal for over 180 degrees, collecting 90 images to obtain all the data. We then process those images and do an inverse Fourier transform to obtain the electron density. This electron density map has been contoured at the one-sigma level. That level's being shown by this blue chicken wire mesh. Atomic models have been fitted to this chicken wire. These lines represent bonds between atoms, atoms are being represented by points. And atoms are colored by atom type, red for oxygen, blue for nitrogen, and then in this case, carbon is colored cyan. We have fitted a drug molecule to the central blob of electron density which corresponds to that active site of this protein, which is RET Kinase. It's important in lung cancer. When we're finished with model building, we will then examine the result of the final structure to prepare images for publication using molecular graphics program. In this case, we've overlaid a number of structures, and we're examining the distance between the side chain of an alanine and one or two drug molecules. This alanine sidechain actually blocks the binding of one of these drugs. The most popular program for doing this kind of analysis and for preparing images for publication is PyMOL. PyMOL was used to prepare these images on the covers of these featured journals. PyMOL is favored because it has 500 commands and 600 parameter settings that provide exquisite control over the appearance of the output. PyMOL has over 100,000 users, reflecting its popularity. This is the GUI for PyMOL. It shows in white the viewport area where one interacts with the loaded molecular object. We have rendered the same RET kinase with a set of preset parameters that have been named "publication". The other way of applying parameter settings and commands is to enter them at the PyMOL prompt. Then the third way is to load and run scripts. PyMOL is actually written in C for speed, but it is wrapped in Python for extensibility. In fact, there are over 100 articles about various plugins and scripts that people have developed to extend PyMOL for years. Here's some examples from the snippet library that I developed. On the left is a default cartoon representation of a RNA hairpin. I find this reduced representation of the RNA hairpin to be too stark. I prefer these alternate ones that I developed. So, these three to the right of this one are not available through pull downs in PyMOL. So why developed a PyMOL snippet library for Org? Well, Org provides great support for literate programming, where you have code blocks that contain code that's executable, and the output is shown below that code block. And then you can fill the surrounding area in the document with the explanatory prose. Org has great support for editing that explanatory prose. Org can run PyMOL through PyMOL's Python API. One of the uses of such an Org document is to assemble a gallery of draft images. We often have to look at dozens of candidate images with the molecule in different orientations, different zoom settings, different representations, different colors, and so on. And to have those images along with…, adjacent to the code that was used to generate them, can be very effective for further editing the code and improving the images. Once the final images have been selected, one can submit the code as part of the supplemental material. Finally, one can use the journal package to use the Org files as an electronic laboratory notebook, which is illustrated with molecular images. This can be very useful when assembling manuscripts months or years later. This shows the YASnippet pull down after my library has been installed. I have an Org file open, so I'm in Org mode. We have the Org mode submenu, and under it, all my snippets are located in these sub-sub-menus that are prepended with pymolpy. Under the molecular representations menu, there is a listing of snippets. The top one is for the ambient occlusion effect, which we're going to apply in this Org file. So these lines of code were inserted after, as well as these flanking lines that define the source block, were inserted by clicking on that line. Then I've added some additional code. So, the first line defines the language that we're using. We're going to use the jupyter-python language. Then you can define the session, and the name of this is arbitrary. Then the kernel is our means by which we gain access to the Python API of PyMOL. The remaining settings apply to the output. To execute this code and to get the resulting image, you put the cursor inside this code block, or on the top line, and enter Control c Control c (C-c C-c). This shows the resulting image has been loaded up. It takes about 10 seconds for this to appear. So the downside of this is if you have a large number of these, the Org file can lag quite a bit when you try to scroll through it, so you need to close up these result drawers, and only open up the ones that you're currently examining. These are features I think are important in practical work. So, the plus is, a feature that's present, minus is absent. I think tab stops and tab triggers are really important. Triggers are important for the fast assertion code, tab stops are important for complete, accurate editing of code. I already addressed the rendering speed and scrolling issue. I think the way around this is just to export the Org document to a PDF file and do your evaluation of different images by examining them in the PDF rather than the Org file. The path to PDF is lightning fast in Emacs compared to Jupyter, where it's cumbersome in comparison. This is a snapshot of my initialization file. These parts are relevant to doing this work. A full description of them can be found in the README file of this repository on GitHub. I'd like to thank the Nathan Shock Data Science Workshop for feedback during presentations I've made about this work. And I would also like to thank the following funding sources for support. I will now take questions. Thank you. captions by Blaine Mooers and Bhavin Gandhi

Back to the schedule
Previous: Managing a research workflow (bibliographies, note-taking, and arXiv)
Next: Finding Your (In)voice: Emacs for Invoicing