Back to the schedule
Previous: One Big-ass Org File or multiple tiny ones? Finally, the End of the debate!
Next: README-Driven Design

Experience Report: Steps to "Emacs Hyper Notebooks"

Joseph Corneli, Raymond Puzio, and Cameron Ray Smith

Download compressed .webm video (14.7M)
Download compressed .webm video (8.6M, highly compressed)
View transcript

We present a short experience report from the perspective of two long-time Emacs users and one relative newcomer. Our motivations relate, broadly, to reproducibility of research in science. We reflect on our experiences with off-the-self solutions available through the Emacs package manager, and describe some of our custom extensions.

When working on a scientific research project, one typically has multiple different computer programs running at the same time. For example, we may use a computer algebra system such as Maxima for calculations, an interactive language such as Julia for numerical computations, TeX for writing up results, a reference manger such as Zotero for the bibliography, Roam for note-taking, and Jekyll for blogging. Switching and moving content among these programs can be distracting, time-consuming, and prone to error. These issues are compounded when there are several collaborators involved.

We explore a solution that looks toward building better "computational notebooks" using Emacs. We take Org mode as our foundation. As many in this audience will know, Org mode integrates features such as writing, task management, program evaluation, typesetting, presentation, and navigation. Tightly integrated add-on packages round out the picture either by directly replacing the functionality of the other programs mentioned above or automatically dispatching commands to them. We outline both the pleasure and pain involved in this experience.

  • Actual start and end time (EST): Start 2020-11-28T14.01.42; Q&A 2020-11-28T14.11.44; End 2020-11-28T14.13.50

Questions

Have you looked into trying SageMath? I've long wanted to use SageMath in Org files.

I can use SageMath from the command line, but not using one of the Emacs shells.

As Joe is now explaining, our ob-servant code should then make it accessible from within Org mode.

Let's not forget about Embedded Calc in Emacs!

Which package have you used to prepare the slides which are visually appealing?

I think he used org-tree-slides, like some earlier presentations.

Notes

Transcripts

00:00:00.320 --> 00:00:30.800 Joe: Hi, I'm Joe Corneli. This is work I did with Ray Puzio and Cameron Smith. They're the main protagonists in this story. They are researchers who've been working on theoretical biology. In a typical project, they may use Maxima and Julia. Their work combines biology, physics and computer science. The latest work-in-progress is on branching processes for cancer modeling.

00:00:30.800 --> 00:00:48.640 How can Emacs possibly help? Let's have a look. Moving code and data between these different programs by hand is annoying. Separate workflows for writing up notes and preparing publications is perhaps even more annoying. All of it is time consuming and error-prone.

00:00:48.640 --> 00:01:10.057 So what about maybe using Jupyter? We found something called Script of Scripts. It solves some of those problems because you can use Maxima and Julia together, but we were quite happy to explore Emacs-based solutions, being Emacs enthusiasts. We even got Cameron to be enthusiastic about doing Emacs, so that went nice.

00:01:10.057 --> 00:02:05.657 Here's a little feature grid of Emacs + Org versus your generic tools that are in a different, more general ecosystem. As you can see, it's quite feature-complete. You've got your maxima-mode, julia-mode. You can use both of them inside of org-mode. You can present things with org-tree-slide. You can set up a wiki inside of org-roam. This is one I found rather recently. You can even use compatibly with org-roam, something called logseq, which is in the browser, so that's nice. You can do real-time collaborative editing, either in a kind of pairing style or in a more Etherpad style. Obviously, you can manage your references. You can typeset whatever you want. You can publish work in progress on a blog. Firn is another one of these external Org Mode tools. It's not actually in Emacs, but works with Org Mode stuff. And, you know... So we're good to go with all of that.

00:02:05.657 --> 00:02:13.890 So what does that look like? Well, here's a little example from before they were doing... before we started really thinking seriously about this stuff.

00:02:13.890 --> 00:02:45.280 So this is just Maxima. Well, Maxima doesn't have a long running process by default. If you've ever used Python, you have something called sessions. They don't have that for Maxima, at least not by default. So how... What was the workaround? There's this thing called solve-for-u here that shows up down below again in these angle brackets, which you've seen maybe in someone else's talk, which means go to the previous thing that was named solve-for-u and do that all over again, so they do that over again.

00:02:45.280 --> 00:03:00.640 Here's the little Maxima code for defining usol, so you've now defined usol, and then you can use it in the next expression. You get out a nice juicy zero at the end. It's a little bit like a partridge in a pear tree to have to redefine everything every time.

00:03:00.640 --> 00:03:22.590 So this is clearly at the level of work-around. Maybe just one more time looking through that stuff. Sorry. So, looking through that stuff, this is... We're going to need something like that, probably, for stitching Maxima and Julia together. so it's good to look a little bit about how that might work.

00:03:22.590 --> 00:03:46.923 First of all, you can cache results, so if you wanted to save the date out of block one at a certain time and then use it again later... At the time when I ran this code, you can see I've got two slightly different time stamps down below. One's the cached result, and the other was the result of reevaluating the block. So you can move things around. That's going to be useful. But you know, that's not really the main problem.

00:03:46.923 --> 00:04:11.760 The main problem is making Maxima long-running. The core of this talk is a new observant facility, which is a general purpose way to do that kind of thing, which involves a very simple change to ob-core. We'll give a quick overview of that and show an example. So here's the example, a very simple sort of silly example.

00:04:11.760 --> 00:04:30.240 What does it mean to have a long-running process? Here, I've set this display2d to be false, which just means that things are going to come come across in 1d. Then I ask it to expand something. I get LaTeX by default. So that's what it means. It's that I've sent something in and it's going to come across in one view, which is great.

00:04:30.240 --> 00:04:40.320 Maybe you'll also notice that there's no semicolon, if you're a Maxima fan, and things are coming across as TeX. So those were some little bonus features. I'll show you how that works later.

00:04:41.040 --> 00:05:13.759 The change to ob-core is as follows. Actually, this should say... Instead of stream here, it should say servant. Sorry. We tried an experimental version which was called stream, so now it's called servant. But all it does is it overrides org-babel-execute lang for arbitrary lang if you have a servant in your params. So that's the change that hasn't been pushed out or sent as a patch to anybody, but it's a pretty minor change.

00:05:13.759 --> 00:05:30.720 Here's an overview without the code. Just a high level overview of observant.el. It stores information about these processes in a hash table. It can do pre-processing and post-processing. It does all these things. It stores the output.

00:05:30.720 --> 00:05:40.639 I mentioned here that, in principle, we could store lots of output and have a kind of browsable history, although we don't do that presently. But that's what observant does. It does what you might expect.

00:05:41.440 --> 00:06:16.960 Here's the Maxima on-ramp to get Maxima brought in. You have to obviously have a Maxima process you can call. puthash... this is the preprocessing thing I mentioned, adding in some Tex and adding in-- or deleting, rather--a substring. Here is why you delete the substring. It's because Maxima thinks it's a good idea to tell you false once you run check on things. You've got to delete that back out to get something coherent out of it. So this is how to set up Maxima.

00:06:16.960 --> 00:06:25.440 That's enough, really, of the demo. It's not really a demo for show and tell, but as this is an experience report, I wanted to talk about the experience of doing this.

00:06:25.440 --> 00:06:42.880 Some negatives, like we tried to get Emacs Jupyter working prior to working on observant. We couldn't get it doing everything we wanted, despite a bit of heavy lifting and debugging and stuff. So that's not finished. That was a bit difficult.

00:06:42.880 --> 00:07:11.695 On the other hand, working on observant was fun, pretty lightweight, and easy. We got some experience co-editing things with these real-time tools. Obviously, the stack is somewhat work in progress. I just wanted to give a shout out to crdt which was really fun, and Qiantan was making bug fixes for that as we go. Similarly, for firn and logseq, the maintainers were really responsive, so that was nice.

00:07:11.695 --> 00:07:27.120 We did try to get Emacs running in the browser, thinking it would be really nice for people who didn't want to install it to get a chance to just try it, but actually, browsers capture things like C-n, so that was a bit annoying.

00:07:27.120 --> 00:07:33.759 But we did get lots of great feedback and interaction with people, including around this conference. Thank you to those who we've had discussions with.

00:07:35.599 --> 00:08:19.120 So, future work. Okay, so... Maybe you remember, I gave a talk a few years back on Arxana. What might this have to do with Org Mode? That's always the question one asks about Arxana. Arxana... One of the things it does is transclusions, and so that could be actually very helpful in connection with this "combined notes and write-up" workflow. So you might have an Org Mode. Some of these results we got back as raw results could go right into your write-up in a convenient way, at a level above-- transparently, a level above the notebook. So you'd have the notebook alongside the write-up in that case, which is a variation on the literate programming workflow. This is speculative. Who knows?

00:08:19.120 --> 00:08:33.357 The other thought is, it just relates to the idea of network programming. So we can imagine these networks of computational nodes sitting inside of org-roam, calling each other. You would want to maintain some kind of model of that process.

00:08:33.357 --> 00:09:11.680 A general question is: how do we have a remote control for long-running processes? You could do that in Lisp or Clojure, but maybe we could have something a little bit like that here. Conclusions: what have we actually addressed? Well, we addressed accessing any long-running process with a simple Org Mode interface. Obviously, we're not the only people to think about notebooks, but we think that Emacs has some advantages related to reproducible research and interdisciplinary collaboration. Let's just say that we think something is reproducible if it's actually teachable to someone new and they can do it. Org Mode seems very useful for that. Many of the other talks have touched on this.

00:09:11.680 --> 00:09:27.857 Interdisciplinary collaboration is great. This was an interdisciplinary collaboration on some level, but what about future work for bringing in scenario planners, simulation scientists, and local farmers, and building something that they can all use that's more than the sum of the parts?

00:09:27.857 --> 00:09:38.135 So a little future work for everybody else here. We think science should be widely teachable, shareable, semi-automated, transdisciplinary, and real-time like EmacsConf.

00:09:38.135 --> 00:10:00.240 So you can get in touch via these methods. The code--which is very much early stage work in progress, as this was meant to be an experience report, not a "it's all done, here, it is polished" report-- it's also online if you'd like to have a look. That's the end of the talk. I don't know if there's time for questions or not, but um I'm at your disposal now. Thank you.

00:10:00.240 --> 00:10:14.240 (Amin: Many thanks for the tough job. Let's see. We have about I think four minutes for questions, and we have a couple of questions on the pad. Would you like to read them yourself or should I read them to you?)

00:10:14.240 --> 00:10:18.079 Just for the sake of easy management why don't you read them out, if that's okay?

00:10:18.079 --> 00:10:33.760 (Amin: yeah, sure. They ask, "Have you looked into trying Sage Math? I've long wanted to use Sage Math in Org files.")

00:10:33.760 --> 00:10:44.839 Ray: Right. I wrote the answer that it should be possible because one can call it from a command.

00:10:44.839 --> 00:11:00.640 (Amin: okay, and I see there's another Sage Math question that you seem to have answered as well, so I guess I won't repeat that. There's... "Let's not forget about embedded Calc in Emacs.")

00:11:00.640 --> 00:11:08.240 Joe: So the first demos actually were with Calc. That's useful. Although I think it was a different--kind of a different command line.

00:11:08.240 --> 00:11:11.839 Ray: Well, that was UNIX Calc.

00:11:11.839 --> 00:11:13.839 Joe: So, sure, there is calc, so that...

00:11:15.680 --> 00:11:19.120 Ray: Calc is already in Org Mode.

00:11:25.680 --> 00:11:57.290 (Amin: Still looking for questions. Okay, I think that's about it. I don't see any questions on the Etherpad. And let's see... Anything on irc? Nothing but praises and everyone thanking you. Thank you.)

00:11:57.290 --> 00:11:59.120 Ray: all right, you're welcome.

00:11:59.120 --> 00:12:01.923 Joe: Thanks a lot! We'll see you guys around then.

00:12:01.923 --> 00:12:06.800 Amin: Cheers, and see you around!

Saturday, Nov 28 2020, ~ 2:05 PM - 2:15 PM EST
Saturday, Nov 28 2020, ~11:05 AM - 11:15 AM PST
Saturday, Nov 28 2020, ~ 7:05 PM - 7:15 PM UTC
Saturday, Nov 28 2020, ~ 8:05 PM - 8:15 PM CET
Sunday, Nov 29 2020, ~ 3:05 AM - 3:15 AM +08

Back to the schedule
Previous: One Big-ass Org File or multiple tiny ones? Finally, the End of the debate!
Next: README-Driven Design